Thermus brockianus nucleic acid polymerases

ABSTRACT

The invention provides nucleic acids and polypeptides for nucleic acid polymerases from a thermophilic organism,  Thermus brockianus . The invention also provides methods for using these nucleic acids and polypeptides.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Divisional application of U.S. Non-Provisionalapplication Ser. No. 10/302,817, filed Nov. 22, 2002, which claims apriority benefit under 35 U.S.C. §119(e) from U.S. Patent ApplicationNo. 60/334,434, filed Nov. 30, 2001, and which is incorporated herein byreference.

FIELD OF THE INVENTION

The invention relates to nucleic acids and polypeptides for a nucleicacid polymerase isolated from a thermophilic organism, Thermusbrockianus.

BACKGROUND OF THE INVENTION

DNA polymerases are naturally-occurring intracellular enzymes used by acell for replicating DNA by reading one nucleic acid strand andmanufacturing its complement. Enzymes having DNA polymerase activitycatalyze the formation of a bond between the 3′ hydroxyl group at thegrowing end of a nucleic acid primer and the 5′ phosphate group of anewly added nucleotide triphosphate. Nucleotide triphosphates used forDNA synthesis are usually deoxyadenosine triphosphate (A),deoxythymidine triphosphate (T), deoxycytosine triphosphate (C) anddeoxyguanosine triphosphate (G), but modified or tered versions of thesenucleotides can also be used. The order in which the nucleotides areadded is dictated by hydrogen-bond formation between A and T nucleotidebases and between G and C nucleotide bases.

Bacterial cells contain three types of DNA polymerases, termedpolymerase I, II and III. DNA polymerase I is the most abundantpolymerase and is generally responsible for certain types of DNA repair,including a repair-like reaction that permits the joining of Okazakifragments during DMA replication. Pol I is essential for the repair ofDNA damage induced by UV irradiation and radiomimetic drugs. Pol II isthought to play a role in repairing DNA damage that induces the SOSresponse. In mutants that lack both pol I and III, pol II repairsUV-induced lesions. Pol I and II are monomeric polymerases while pol IIIis a multisubunit complex.

Enzymes having DNA polymerase activity are often used in vitro for avariety of biochemical applications including cDNA synthesis and DNAsequencing reactions. See Sambrook e al., Molecular Cloning: ALaboratory Manual (3rd ed. Cold Spring Harbor Laboratory Press, 2001,hereby incorporated by reference. DNA polymerases are also used foramplification of nucleic acids by methods such as the polymerase chainreaction (PCR) (Mullis et at, U.S. Pat. Nos. 4,683,195, 4,683,202, and4,800,159, incorporated by reference) and RNA transcription-mediatedamplification methods (e.g., Kacian et al., PCT Publication No.WO91/01384, incorporated by reference).

DNA amplification utilizes cycles of primer extension through the use ofa DNA polymerase activity, followed by thermal denaturation of theresulting double-stranded nucleic acid in order to provide a newtemplate for another round of primer annealing and extension. Becausethe high temperatures necessary for strand denaturation result in theirreversible inactivations of many DNA polymerases, the discovery anduse of DNA polymerases able to remain active at temperatures above about37□C provides an advantage in cost and labor efficiency.

Thermostable DNA polymerases have been discovered in a number ofthermophilic organisms including Thermus aquaticus, Thermusthermophilus, and species within the genera the Bacillus, Thermaococcus,Sulfobus, and Pyrococcus. A full length thermostable DNA polymerasederived from Thermus aquaticus (Taq) has been described by Lawyer, etal., J. Biol. Chem. 264:6427-6437 (1989) and Gelfand et al, U.S. Pat.No. 5,466,591. The cloning and expression of truncated versions of thatDNA polymerase are further described in Lawyer et al., in PCR Methodsand Applications, 2:275-787 (1993), and Barnes, PCT Publication No.WO92/06188 (1992). Sullivan reports the cloning of a mutated version ofthe Taq DNA polymerase in EPO Publication No. 0482714A1 (1992). A DNApolymerase from Thermus thermophilus has also been cloned and expressed.Asakura et al., J. Ferment. Bioeng. (Japan), 74:265-269 (1993). However,the properties of the various DNA polymerases vary. Accordingly, new DNApolymerases are needed that have improved sequence discrimination,better salt tolerance, varying degrees of thermostability, improvedtolerance for labeled or dideoxy nucleotides and other valuableproperties.

SUMMARY OF THE INVENTION

The invention provides nucleic acid polymerases isolated from, athermophilic organism, Thermus brockianus, for example, from strainsYS38 and 2AZN.

In one embodiment, the invention provides an isolated nucleic acidcomprising SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, or SEQ ID NO:4, SEQ IDNO:5, SEQ ID NO:6, SEQ ID NO:7 or SEQ ID NO:8, and complementary nucleicacids. In another embodiment, the invention provides an isolated nucleicacid encoding a polypeptide comprising an amino acid sequence that hasat least 96% identity to SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ IDNO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15 or SEQ ID NO:16. Theinvention also provides vectors comprising these isolated nucleic acids,including expression vectors comprising a promoter operably linked tothese isolated nucleic acids. Host cells comprising such isolatednucleic acids and vectors are also provided by the invention,particularly host cells capable of expressing a thermostable polypeptideencoded by the nucleic acid, where the polypeptide has DNA polymeraseactivity.

The invention also provides isolated polypeptides mat can include aminoacid sequence comprising any one of SEQ ID NO:9-49. The isolatedpolypeptides provided by the invention preferably are thermostable andhave a DNA polymerase activity between 50,000 U/mg protein and 500,000U/mg protein.

The invention further provides a method of synthesizing DNA thatincludes contacting a polypeptide comprising SEQ ID NO:9, SEQ ID NO:10,SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15 orSEQ ID NO:16 with a DNA under conditions sufficient to permitpolymerization of DNA.

The invention further provides a method for thermocyclic amplificationof nucleic acid that comprises contacting a nucleic acid with athermostable polypeptide having SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11,SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15 or SEQ ID NO:16under conditions suitable for amplification of the nucleic acid, andamplifying the nucleic acid. In general, one or more primers areincluded in the amplication mixture, where each primer can hybridize toa separate segment of the nucleic acid. Such amplification can includecycling the temperature to permit denaturation of nucleic acids,annealing of a primer to a template nucleic acid and polymerization of anucleic acid complementary to the template nucleic acid. Amplificationcan be, for example, by Strand Displacement Amplification or PolymeraseChain Reaction.

The invention also provides a method of primer extending DNA comprisingcontacting a polypeptide comprising SEQ ID NO:9, SEQ ID NO:10, SEQ IDNO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15 or SEQ IDNO:16 with a DNA and a primer capable of hybridizing to a segment of theDMA under conditions sufficient to permit polymerization of DMA, Suchprimer extension can be performed, for example, to sequence DNA or toamplify DNA.

The invention further provides a method of making a DNA polymerasecomprising SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ IDNO:13, SEQ ID NO:14, SEQ ID NO:15 or SEQ ID NO:16. The method comprisesincubating a host cell under conditions sufficient for RNA transcriptionand translation, wherein the host cell comprises a nucleic acid thatencodes a polypeptide comprising SEQ ID NO:9, SEQ ID NO:10, SEQ IDNO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15 or SEQ IDNO:16 operably linked to a promoter. In one embodiment, the method usesa nucleic acid that comprises SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8. Theinvention is also directed to a DNA polymerase made by this method.

The invention also provides a kit that includes a container containing aDNA polymerase that has an amino acid sequence comprising SEQ ID NO:9,SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14,SEQ ID NO:15 or SEQ ID NO:16. The kit can also contain an unlabelednucleotide, a labeled nucleotide, a balanced mixture of nucleotides, achain terminating nucleotide, a nucleotide analog, a buffer solution, asolution containing magnesium, a cloning vector, a restrictionendonuclease, a sequencing primer, a solution containing reversetranscriptase, or a DNA or RNA amplification primer. Such kits can, forexamples be adapted for performing DNA sequencing, DNA amplification,RNA amplification or primer extension reactions.

DESCRIPTION OF THE FIGURE

FIG. 1 provides an alignment of nucleic acid polymerase nucleic acidsfrom Thermus brockianus strains 2AZN and YS38. Two codon differencesexist between these strains. One is silent and the other is a differenceof C vs T at position 1637 (indicated in boldface), encoding adifference of leucine vs proline at amino acid position 546.

FIG. 2 provides a comparison of amino acid sequences for polymerasesfrom Thermus aquaticus (Taq), Thermus thermophilus (Tth), Thermusfiliformis (Tfi), Thermus flavus Tfl), Thermus brockianus strain YS38(Tbr YS38) and Thermus brockianus strain 2AZN (Tbr 2AZN).

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to nucleic acid and amino acid sequencesencoding nucleic acid polymerases from thermophilic organisms. Inparticular, the present invention provides nucleic acid polymerases fromThermus brockianus. The nucleic acid polymerases of the invention can heused in a variety of procedures, including DNA primer extension, DNAsequencing, reverse transcription and DNA amplification procedures.

Definitions

The term “amino acid sequence” refers to the positional arrangement andidentity of amino acids in a peptide, polypeptide or protein molecule.Use of the term “amino acid sequence” is not meant to limit the ammoacid sequence to the complete, native amino acid sequence of a peptide,polypeptide or protein.

“Chimeric” is used to indicate that a DNA sequence, such as a vector ora gene, is comprised of more than one DNA sequence of distinct originthat is fused by recombinant DNA techniques to another DNA sequence,resulting In a longer DNA sequence that does not occur naturally.

The term “coding region” refers to the nucleic acid segment that codesfor a protein of interest. The coding region of a protein is hounded onthe 5′ side by the nucleotide triplet “ATG” that encodes the initiatormethionine and on the 3′ side by one of the three triplets that specifystop codons (i.e., TAA, TAG, TGA).

“Constitutive expression” refers to expression using a constitutivepromoter.

“Constitutive promoter” refers to a promoter that is able to express thegene that it controls in all, or nearly all, phases of the life cycle ofthe cell.

“Complementary” or “complementarity” are used to define the degree ofbase-pairing or hybridization between nucleic acids. For example, as isknown to one of skill in the art, adenine (A) can form hydrogen bonds orbase pair with thymine (T) and guanine (G) can form hydrogen bonds orbase pair with cytosine (C). Hence, A is complementary to T while G iscomplementary to C. Complementarity may be complete when all bases in adouble-stranded nucleic acid are base paired. Alternatively,complementarity may be “partial,” when only some of the bases in anucleic acid are matched according to the base pairing rules. The degreeof complementarity between nucleic acid strands has an effect on theefficiency and strength of hybridization between nucleic acid strands.

The “derivative” of a reference nucleic acid, protein, polypeptide orpeptide, is a nucleic arid, protein, polypeptide or peptide,respectively, with a related but different sequence or chemicalstructure than the respective reference nucleic acid, protein,polypeptide or peptide. A derivative nucleic acid, protein, polypeptideor peptide is generally made purposefully to enhance or incorporate somechemical, physical or functional property that is absent or only weaklypresent in the reference nucleic acid, protein, polypeptide or peptide.A derivative nucleic acid generally can differ in nucleotide sequencefrom a reference nucleic acid whereas a derivative protein, polypeptideor peptide can differ in amino acid sequence from the reference protein,polypeptide or peptide, respectively. Such sequence differences can beone or more substitutions, insertions, additions, deletions, fusions andtruncations, which can be present in any combination. Differences can beminor (e.g., a difference of one nucleotide or amino acid) or moresubstantial. However, the sequence of the derivative is not so differentfrom the reference that one of skill in the art would not recognize thatthe derivative and reference are related in structure and/or function.Generally, differences are limited so that the reference and thederivative are closely similar overall and, in many regions, identical.A “variant” differs from a “derivative” nucleic acid, protein,polypeptide or peptide in that the variant can have silent structuraldifferences that do not significantly change the chemical, physical orfunctional properties of the reference nucleic acid, protein,polypeptide or peptide. In contrast, the differences between thereference and derivative nucleic acid, protein, polypeptide or peptideare intentional changes made to improve one or more chemical, physicalor functional properties of the reference nucleic acid, protein,polypeptide or peptide.

The terms “DNA polymerase activity,” “synthetic activity” and“polymerase activity” are used interchangeably and refer to the abilityof an enzyme to synthesize new DNA strands by the incorporation ofdeoxynucleoside triphosphates. A protein that can direct the synthesisof new DNA strands by the incorporation of deoxynucleoside triphosphatesin a template-dependent manner is said to be “capable of DNA syntheticactivity.”

The term “5′ exonuclease activity” refers to the presence of an activityin a protein that is capable of removing nucleotides from the 5′ end ofa nucleic acid.

The term “3′ exonuclease activity” refers to the presence of an activityin a protein that is capable of removing nucleotides from the 3′ end ofa nucleic acid.

“Expression” refers to the transcription and/or translation of anendogenous or exogeneous gene in an organism. Expression generallyrefers to the transcription and stable accumulation of mRNA. Expressionmay also refer to the prod action of protein.

“Expression cassette” means a nucleic acid sequence capable of directingexpression of a particular nucleotide sequence. Expression cassettesgenerally comprise a promoter operably linked to the nucleotide sequenceto be expressed (e.g., a coding region) that is operably linked totermination signals. Expression cassettes also typically comprisesequences required for proper translation of the nucleotide sequence.The expression cassette comprising the nucleotide sequence of interestmay be chimeric, meaning that at least one of its components isheterologous with respect to at least one of its other components. Theexpression of the nucleotide sequence in the expression cassette may beunder the control of a constitutive promoter or under control of aninducible promoter that initiates transcription only when the host cellis exposed to some particular external stimulus. In the case of amulticellular organism, the promoter can also be specific to aparticular tissue or organ or stage of development.

The term “gene” is used broadly to refer to any segment of nucleic acidassociated with a biological function. The term “gene” encompasses thecoding region of a protein, polypeptide, peptide or structural RNA. Theterm “gene” also includes sequences up to a distance of about 2 kb oneither end of a coding region. These sequences are referred to as“flanking” sequences or regions (these flanking sequences are located 5′or 3′ to the non-translated sequences present on the mRNA transcript).The 5′ flanking region may contain regulatory sequences such aspromoters and enhancers or other recognition or binding sequences forproteins that control or influence the transcription of the gene. The 3′flanking region may contain sequences that direct the termination oftranscription, post-transcriptional cleavage and polyadenylation as wellas recognition sequences for other proteins. A protein or polypeptideencoded in a gene can be full length or any portion thereof, so that allactivities or functional properties are retained, or so that onlyselected activities (e.g., enzymatic activity, ligand binding, of signaltransduction) of the full-length protein or polypeptide are retained.The protein or polypeptide can include any sequences necessary for theproduction of a proprotein or precursor polypeptide. The term “nativegene” refers to gene that is naturally present in the genome of anuntransformed cell.

“Genome” refers to the complete generic material that is naturallypresent in an organism and is transmitted from one generation to thenext.

The terms “heterologous nucleic acid,” or “exogenous nucleic acid” referto a nucleic acid that originates from a source foreign to theparticular host cell or, if from the same source, is modified from itsoriginal form. Thus, a heterologous gene in a host cell includes a genemat is endogenous to the particular host cell bin has been modifiedthrough, for example, the use of DMA shuffling. The terms also includenon-naturally occurring multiple copies of a naturally occurring nucleicacid. Thus, the terms refer to a nucleic acid segment that is foreign orheterologous to the cell, or normally found within the cell but in aposition within the cell or genome where it is not ordinarily found.

The term “homology” refers to a degree of similarity between a nucleicacid and a reference nucleic acid or between a polypeptide and areference polypeptide. Homology may be partial or complete. Completehomology indicates that the nucleic acid or amino acid sequences areidentical. A partially homologous nucleic acid or amino acid sequence isone that is not identical, to the reference nucleic acid or amino acidsequence. Hence, a partially homologous nucleic acid has one or morenucleotide differences in its sequence relative to the nucleic acid towhich it is being compared. The degree of homology can be determined bysequence comparison. Alternatively, as is well understood by thoseskilled in the art, DNA-DNA or DNA-RNA hybridization, under varioushybridization conditions, can provide an estimate of the degree ofhomology between nucleic acids, (see, e.g., Haines and Higgins (eds.),Nucleic Acid Hybridization, IRL Press, Oxford, U.K.).

“Hybridization” refers to the process of annealing complementary nucleicacid strands by forming hydrogen bonds between nucleotide bases on thecomplementary nucleic acid strands. Hybridization, and the strength ofthe association between the nucleic acids, is impacted by such factorsas the degree of complementary between the hybridizing nucleic acids,the stringency of the conditions involved, the Tm of the formed hybrid,and the G:C ratio within the nucleic acids.

“Inducible promoter” refers to a regulated promoter that can be turnedon in one or more cell types by an external stimulus, such as achemical, light, hormone, stress, temperature or a pathogen.

An “initiation site” is region surrounding the position of the firstnucleotide that is part, of the transcribed sequence, which is definedas position +1. All nucleotide positions of the gene are numbered byreference to the first nucleotide of the transcribed sequence, whichresides within the initiation site. Downstream sequences (i.e.,sequences in the 3′ direction) are denominated positive, while upstreamsequences (i.e., sequences in the 5′ direction) are denominatednegative.

An “isolated” or “purified” nucleic acid or an “isolated” or “purified”polypeptide is a nucleic acid or polypeptide that, by the hand of man,exists apart from its native environment and is therefore not a productof nature. An isolated nucleic acid or polypeptide may exist in apurified form or may exist in a non-native environment such as, forexample, within a transgenic host cell.

The term “invader oligonucleotide” refers to an oligonucleotide thatcontains sequences at its 3′ end that are substantially the same assequences located at the 5′ end of a probe oligonucleotide. Theseregions will compete for hybridization to the same segment along acomplementary target nucleic acid.

The term “label” refers to any atom or molecule that can be used toprovide a detectable (preferably quantifiable) signal, and that can beattached to a nucleic acid or protein. Labels may provide signalsdetectable by fluorescence, radioactivity, colorimetry, gravimetry,X-ray diffraction or absorption, magnetism, enzymatic activity, and thelike.

The term “nucleic acid” refers to deoxyribonucleotides orribonucleotides and polymers thereof in either single- ordouble-stranded form, composed of monomers (nucleotides) containing asugar, phosphate and a base that is either a purine or pyrimidine.Unless specifically limited, the term encompasses nucleic acidscontaining known analogs of natural nucleotides that have similarbinding properties as the reference nucleic acid and are metabolized ina manner similar to naturally occurring nucleotides. Unless otherwiseindicated, a particular nucleic acid sequence also implicitlyencompasses conservatively modified variants thereof (e.g., degeneratecodon substitutions) and complementary sequences as well as thereference sequence explicitly indicated.

The term “oligonucleotide” as used herein is defined as a moleculecomprised of two or more deoxyribonucleotides or ribonucleotides,preferably more than three, and usually more than ten or fifteen. Thereis no precise upper limit on the size of an oligonucleotide. However, ingeneral, an oligonucleotide is shorter than about 250 nucleotides,preferably shorter than about 200 nucleotides and more preferablyshorter than about 100 nucleotides. The exact size will depend on manyfactors, which in turn depends on the ultimate function or use of theoligonucleotide. The oligonucleotide may be generated in any manner,including chemical synthesis, DNA replication, reverse transcription, ora combination thereof.

The terms “open reading frame” and “ORF” refer to the amino acidsequence encoded between translation initiation and termination codonsof a coding sequence. The terms “initiation codon” and “terminationcodon” refer to a unit of three adjacent nucleotides (‘codon’) in acoding sequence that specifies initiation and chain termination,respectively, of protein synthesis (mRNA translation).

“Operably linked” means joined as part of the same nucleic acidmolecule, so that the function of one is affected by the other. Ingeneral, “operably linked” also means that two or more nucleic acids aresuitably positioned and oriented so that they can function together.Nucleic acids are often operably linked to permit transcription of acoding region to be initiated from the promoter. For example, aregulatory sequence is said to be “operably linked to” or “associatedwith” a DNA sequence that codes for an RNA or a polypeptide if the twosequences are situated such that the regulatory sequence affectsexpression of the coding region (i.e., that the coding sequence orfunctional RNA is under the transcriptional control of the promoter).Coding regions can be operably-linked to regulatory sequences in senseor antisense orientation.

The term “probe oligonucleotide” refers to an oligonucleotide thatinteracts with a target nucleic acid to form a cleavage structure in thepresence or absence of an invader oligonucleotide. When annealed to thetarget nucleic acid, the probe oligonucleotide and target form acleavage structure and cleavage occurs within the probe oligonucleotide.The presence of an invader oligonucleotide upstream of the probeoligonucleotide can shift the site of cleavage within the probeoligonucleotide (relative to the site of cleavage in the absence of theinvader).

“Promoter” refers to a nucleotide sequence, usually upstream (5′) to acoding region, which controls the expression of die coding region byproviding the recognition site for RNA polymerase and other factorsrequired for proper transcription. “Promoter” includes but is notlimited a minimal promoter that is s short DNA sequence comprised of aTATA-box. Hence, a promoter includes other sequences that serve tospecify the site of transcription initiation and control or regulateexpression, for example, enhancers. Accordingly, an “enhancer” is a DNAsequence that can stimulate promoter activity and may be an innateelement of the promoter or a heterologous element inserted to enhancethe level or tissue specificity of a promoter. It is capable ofoperating in both orientations (normal or flipped), and is capable offunctioning even when moved either upstream or downstream from feepromoter. Promoters may be derived in their entirety from a native gene,or be composed of different elements derived from different promotersfound in nature, or even be comprised of synthetic DNA segments. Apromoter may also contain DNA sequences that are involved in the bindingof protein factors that control the effectiveness of transcriptioninitiation in response to physiological or developmental conditions.

The terms “protein,” “peptide” and “polypeptide” are usedinterchangeably herein.

“Regulatory sequences” and “regulatory elements” refer to nucleotidesequences that, control some aspect of the expression of nucleic acidsequences. Such sequences or elements can be located upstream (5′non-coding sequences), within, or downstream (3′ non-coding sequences)of a coding sequence. “Regulatory sequences” and “regulatory elements”influence the transcription, RNA processing or stability, or translationof the associated coding sequence. Regulatory sequences includeenhancers, introns, promoters, polyadenylation signal sequences,splicing signals, termination signals, and translation leader sequences.Regulatory sequences also include natural and synthetic sequences.

As used herein, the term “selectable marker” refers to a gene thatencodes an observable or selectable trait that is expressed and can bedetected in an organism having that gene. Selectable markers are oftenlinked to a nucleic acid of interest that may not encode an observabletrait in order to trace or select for the presence of the nucleic acidof interest. Any selectable marker known to one of skill in the art canbe used with the nucleic acids of the invention. Some selectable markersallow the host to survive under circumstances where, without the marker,the host would otherwise die. Examples of selectable markers includeantibiotic resistance, for example, tetracycline or ampicillinresistance.

As used herein the term “stringency” is used to define the conditions oftemperature, ionic strength, and the presence of other compounds such asorganic solvents, under which nucleic acid hybridizations are conducted.With “high stringency” conditions, nucleic acid base pairing will occuronly between nucleic acids that have a high frequency of complementarybase sequences. With “weak” or “low” stringency conditions nucleic acidsthe frequency of complementary sequences is usually less, so thatnucleic acids with differing sequences can be detected and/or isolated.

The terms “substantially similar” and “substantially homologous” referto nucleotide and amino acid sequences dial represent functionalequivalents of the instant inventive sequences. For example, alterednucleotide sequences that simply reflect the degeneracy of the geneticcode but nonetheless encode amino acid sequences that are identical tothe inventive amino acid sequences are substantially similar to theinventive sequences. In addition, amino acid sequences that aresubstantially similar to the instant sequences are those wherein overallamino acid identity is sufficient to provide an active, thermally stableDNA polymerase I. For example, amino acid sequences that aresubstantially similar to the sequences of the invention are thosewherein the overall amino acid identity is 80% or greater, preferably90% or greater, such as 91%, 92%, 93%, or 94%, and more preferably 95%identity or greater, such as 96%, 97%, 98%, or 99% identity, relative tothe amino acid sequences of the invention.

A “terminating agent”, “terminating nucleotide” or “terminator” inrelation to DMA synthesis or sequencing refers to compounds capable ofspecifically terminating a DNA sequencing reaction, at a specific base,such compounds include but are not limited to, dideoxynucleosides havinga 2′,3′ dideoxy structure (e.g., ddATP, ddCTP, ddGTP and ddTTP).

“Thermostable” means that a nucleic acid polymerase remains active at atemperature greater than about 37□C. Preferably, the nucleic acidpolymerases of the invention remain active at a temperature greater thanabout 42□C. More preferably, the nucleic acid polymerases of theinvention remain active at a temperature greater than about 50□C. Evenmore preferably, the nucleic acid polymerases of the invention remainactive after exposure to a temperature greater than about 60□C. Mostpreferably, the nucleic acid polymerases of the invention remain activedespite exposure to a temperature greater than about 70□C.

A “transgene” refers to a gene that has been introduced into the genomeby transformation and is stably maintained. Transgenes may include, forexample, genes that are either heterologous or homologous to the genesof a particular organism to be transformed. Additionally, transgenes maycomprise native genes inserted into a non-native organism, or chimericgenes. The term “endogenous gene” refers to a native gene in its naturallocation in the genome of an organism. A “foreign” or “exogenous” generefers to a gene not normally found in the host organism but that isintroduced by gene transfer.

The term “transformation” refers to the transfer of a nucleic acidfragment into the genome of a host cell, resulting in genetically stableinheritance. Host cells containing the transformed nucleic acidfragments are referred to as “transgenic” ceils, and organismscomprising transgenic cells are referred to as “transgenic organisms.”Transformation may be accomplished by a variety of means known to theart including calcium DNA co-precipitation, electroporation, viralinfection, and the like.

The “variant” of a reference nucleic acid, protein, polypeptide orpeptide, is a nucleic acid, protein, polypeptide or peptide,respectively, with a related but different sequence than the respectivereference nucleic acid, protein, polypeptide or peptide. The differencesbetween variant and reference nucleic acids, proteins, polypeptides orpeptides are silent or conservative differences. A variant nucleic aciddiffers from a reference nucleic acid in nucleotide sequence whereas avariant nucleic acid, protein, polypeptide or peptide differs in aminoacid sequence from the reference protein, polypeptide or peptide,respectively. A variant and reference nucleic acid, protein, polypeptideor peptide may differ in sequence by one or more substitutions,insertions, additions, deletions, fusions and-truncations, which may bepresent in any combination. Differences can be minor (e.g., a differenceof one nucleotide or amino acid) or more substantial. However, thestructure and function of the variant is not so different from thereference that one of skill in the art would not recognize that thevariant and reference are related in structure and/or function.Generally, differences are limited so that the reference and the variantare closely similar overall and, in many regions, identical.

The term “vector” is used to refer to a nucleic acid that can transferanother nucleic acid segment(s) into a cell. A “vector” includes, interalia, any plasmid, cosmid, phage or nucleic acid in double- orsingle-stranded, linear or circular form that may or may not beself-transmissible or mobilizable. It can transform prokaryotic oreukaryotic host cells either by integration into the cellular genome orby existing extrachromosomally (e.g., autonomous replicating plasmidwith an origin of replication). Vectors used in bacterial systems oftencontain an origin of replication that allows the vector to replicateindependently of the bacterial chromosome. The term “expression vector”refers to a vector containing an expression cassette.

The term “wild-type” refers to a gene or gene product that has thecharacteristics of that gene or gene product when isolated from anaturally occurring source. A wild-type gene is the gene form mostfrequently observed in a population and thus arbitrarily is designatedthe “normal” or “wild-type” form of the gene. In contrast, the term“variant” or “derivative” refers to a gene or gene product that displaysmodifications in sequence and or functional properties (i.e., alteredcharacteristics) when compared to the wild-type gene or gene product.Naturally occurring derivatives can be isolated. They are identified bythe fact that they have altered characteristics when compared to thewild-type gene or gene product.

Polymerase Nucleic Acids

The invention provides isolated nucleic acids encoding Thermusbrockianus nucleic acid polymerases, as well as derivative, fragment andvariant nucleic acids thereof that encode active, thermally stablenucleic acid polymerases. Thus, one aspect of the invention includes thenucleic acid polymerases encoded by the polynucleotide sequencescontained in Thermus brockianus strains YS3S and AZN. Any nucleic acidencoding amino acid sequence SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11,SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, and SEQ IDNO:16, which are amino acid sequences for wild type and severalderivative Thermus brockianus polymerases, are also contemplated by thepresent invention.

In one embodiment, the invention provides a nucleic acid of SEQ ID NO:1,a wild type Thermus brockianus nucleic acid encoding nucleic acidpolymerase from strain YS38:

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTACGGCT TCGGCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGGGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGAGTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTTT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCGG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTACGAGGAG GTGGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1601GGGAGCTCGC CAAGCTCAAG GGCACCTACA TTGACCTCCT TCCCGCCCTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TGCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1901GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TGTTCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA ACTTCGGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201GGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCGAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGGAGG TCATGGAGGG GGTCTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCCAAGGGC TAG

In another embodiment, the invention provides a nucleic acid of SEQ IDNO:2, another wild type Thermus brockianus nucleic acid encoding anucleic acid polymerase, but from strain 2AZN.

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTACGGCT TCGCCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGGGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGACTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTTT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCGG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTACGAGGAG GTGGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1601GGGAGCTCGC CAAGCTCAAG GGCACCTACA TTGACCCCCT TCCCGCCGTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TGCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1801GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TGTTCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA ACTTCGGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201GGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCGAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGCAGG TCATGGAGGG AGTCTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCCAAGGGC TAGTCGAC

In another embodiment, the invention provides a nucleic acid fromThermus brockianus strain YS38 having SEQ ID NO:3, a derivative nucleicacid having GAC (encoding Asp) in place of GGC (encoding Gly) atpositions 127-129. SEQ ID NO:3 is provided below:

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTAC GAC T TCGCCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGCGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301 GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGACTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTIT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCCG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTCACGAGGAG GTGGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1501GGGAGCTCGC CAAGCTCAAG GGCACCTACA TTGACCTCCT TCCCGCCCTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TGCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1801GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TGTTCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA ACTTCGGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201GGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCGAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGGAGG TCATGGAGGG GGTCTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCCAAGGGC TAG

In another embodiment, the invention provides a derivative nucleic acidrelated to Thermos brockianus strain 2AZN having SEQ ID NO:4. SEQ IDNO:4 is a derivative nucleic acid having GAC (encoding Asp) in place ofGGC (encoding Gly) at positions 127-129 and is provided below:

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTAC GAC T TCGCCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGGGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGACTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTTT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCGG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTACGAGGAG GTCGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1601GGGAGCTCGC CAAGCTCAAG GGCACCTACA TTGACCCCCT TCCCGCCCTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TGCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1801GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TG1TCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA ACTTCGGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201CGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCGAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGGAGG TCATGGAGGG AGTCTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCCAAGGGC TAGTCGAC

In another embodiment, the invention provides a derivative nucleic acidrelated to Thermus brockianus strain YS38, having SEQ ID NO:5. SEQ IDNO:5 is a derivative nucleic acid having TAC (encoding Tyr) in place ofTTC (encoding Phe) at positions 1993-95 and is provided below:

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTACGGCT TCGCCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGGGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGACTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTTT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCGG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTACGAGGAG GTGGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1601GGGAGCTCGC CAAGCTCAAG GGCACCTACA TTGACCTCCT TCCCGCCCTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TGCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1801GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TGTTCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA AC TAC GGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201GGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCGAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGGAGG TCATGGAGGG GGTCTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCGAAGGGC TAG

In another embodiment, the invention provides a derivative nucleic acidrelated to Thermus brockianus strain 2AZN, having SEQ ID NO:6. SEQ IDNO:6 is a derivative nucleic acid having TAG (encoding Tyr) in place ofTTC (encoding Phe) at positions 1993-95 and is provided below:

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTACGGCT TCGCCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGGGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGACTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTTT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCGG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTACGAGGAG GTGGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1601GGGAGCTCGC CAAGCTCAAG GGCACCTACA TTGACCCCCT TCCCGCCCTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TCCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1801GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TGTTCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA AC TAC GGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201GGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCGAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGGAGG TCATGGAGGG AGTCTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCCAAGGGC TAGTCGAC

In another embodiment, the invention provides a derivative nucleic acidrelated to Thermus brockianus strain YS38 having SEQ ID NO:7. A nucleicacid having SEQ ID NO:7 has GAG (encoding Asp) in place of GGC (encodingGly) at positions 127-129 and TAC (encoding Tyr) in place of TTC(encoding Phe) at positions 1993-95.

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTAC GAC T TCGCCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGGGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGACTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTTT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCGG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTACGAGGAG GTGGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1601GGGAGCTCGC CAAGCTGAAG GGCACCTACA TTGACCTCCT TCCCGCCCTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TGCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1801GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TGTTCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA AC TAC GGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201GGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCGAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGGAGG TCATGGAGGG GGTGTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCCAAGGGC TAG

In another embodiment, the invention provides a derivative nucleic acidrelated to Thermus brockianus strain 2AZN having SEQ ID NO:8. A nucleicacid having SEQ ID NO:8 has GAC (encoding Asp) in place of GGC (encodingGly) at positions 127-129 and TAC (encoding Tyr) in place of TTC(encoding Phe) at positions 1993-95.

1 ATGCTTCCCC TCTTTGAGCC CAAGGGCCGG GTGCTCCTGG TGGACGGCCA 51CCACCTGGCC TACCGTAACT TCTTCGCCCT CAAGGGGCTC ACCACGAGCC 101GGGGCGAGCC CGTGCAAGGG GTCTAC GAC T TCGCCAAAAG CCTCCTCAAG 151GCCCTGAAGG AGGACGGGGA CGTGGTCATC GTGGTCTTTG ACGCCAAGGC 201CCCCTCTTTT CGCCACGAGG CCTACGGGGC CTACAAGGCG GGCCGGGCCC 251CTACCCCGGA GGACTTTCCG AGGCAGCTTG CCCTCATGAA GGAGCTTGTG 301GACCTTTTGG GGCTGGAGCG CCTCGAGGTC CCGGGCTTTG AGGCGGACGA 351TGTCCTCGCC GCCCTGGCCA AGAAGGCGGA GCGGGAAGGG TACGAGGTGC 401GCATCCTCAC CGCCGACCGG GACCTCTTCC AGCTTCTTTC GGACCGCATC 451GCCGTCCTGC ACCCGGAAGG CCACCTCATC ACCCCGGGGT GGCTTTGGGA 501GAGGTACGGC CTGAGACCGG AGCAGTGGGT GGACTTCCGC GCCCTGGCCG 551GCGACCCTTC CGACAACATC CCCGGGGTGA AGGGGATCGG CGAGAAGACG 601GCCCTGAAGC TCCTAAAGGA GTGGGGTAGT CTGGAAAATA TCCAAAAAAA 651CCTGGACCAG GTCAGTCCCC CTTCCGTGCG CGAGAAGATC CAGGCCCACC 701TGGACGACCT CAGGCTCTCC CAGGAGCTTT CCCGGGTGCG CACGGACCTT 751CCCTTGGAGG TGGACTTTAG AAGGCGGCGG GAGCCCGATA GGGAAGGCCT 801TAGGGCCTTC TTAGAGCGGC TTGAGTTCGG GAGCCTCCTC CACGAGTTCG 851GCCTCCTGGA AAGCCCCCAG GCGGCGGAGG AGGCCCCTTG GCCGCCGCCG 901GAAGGGGCCT TCTTGGGCTT CCGCCTCTCC CGGCCCGAGC CCATGTGGGC 951GGAACTCCTT TCCTTGGCGG CAAGCGCCAA GGGCCGGGTC TACCGGGCGG 1001AGGCGCCCCA TAAGGCCCTT TCGGACCTGA AGGAGATCCG GGGGCTTCTC 1051GCCAAGGACC TCGCCGTCTT GGCCCTGAGG GAGGGGCTCG GCCTTCCCCC 1101CACGGACGAT CCCATGCTCC TCGCCTACCT CCTGGACCCC TCCAACACCA 1151CCCCCGAGGG CGTGGCCCGG CGCTACGGGG GGGAGTGGAC GGAGGAGGCG 1201GGGGAGAGGG CCTTGCTTGC CGAAAGGCTT TACGAGAACC TCCTAAGCCG 1251CCTGAAAGGG GAAGAAAAGC TCCTTTGGCT CTACGAGGAG GTGGAAAAGC 1301CCCTTTCCCG GGTCCTCGCC CACATGGAGG CCACGGGGGT GAGGCTGGAC 1351GTACCCTACC TAAGGGCCCT TTCCCTGGAG GTGGCGGCGG AGATGGGCCG 1401CCTGGAGGAG GAGGTTTTCC GCCTGGCGGG CCACCCCTTC AACCTGAACT 1451CCCGCGACCA GCTGGAAAGG GTGCTCTTTG ACGAGCTCGG GCTTCCCCCC 1501ATCGGCAAGA CGGAAAAAAC CGGGAAGCGC TCCACCAGCG CCGCCGTCCT 1551CGAGGCCCTG CGGGAGGCCC ACCCCATCGT GGAGAAGATC CTCCAGTACC 1601GGGAGCTCGC CAAGCTCAAG GGCACCTACA TTGACCCCCT TCCCGCCCTG 1651GTCCACCCCA GGACGGGCAG GCTCCACACC CGCTTCAACC AGACGGCCAC 1701GGCCACGGGC CGCCTTTCCA GCTCCGACCC CAACCTGCAG AACATTCCCG 1751TGCGCACCCC CTTGGGCCAA AGGATCCGCC GGGCCTTCGT GGCCGAGGAG 1801GGGTACCTTC TCGTGGCCCT GGACTATAGC CAGATTGAGC TGAGGGTCCT 1851GGCCCACCTC TCGGGGGACG AAAACCTCAT CCGGGTCTTC CAGGAGGGCC 1901GGGACATCCA CACCCAGACG GCGAGCTGGA TGTTCGGCCT GCCGGCGGAG 1951GCCATAGACC CCCTCAGGCG CCGGGCGGCC AAGACCATCA AC TAC GGCGT 2001CCTCTACGGC ATGTCCGCCC ACCGGCTTTC CCAGGAGCTG GGCATCCCCT 2051ACGAGGAGGC GGTGGCCTTC ATTGACCGCT ATTTCCAGAG CTACCCCAAG 2101GTGAAGGCCT GGATTGAAAG GACCCTGGAG GAGGGGCGGC AAAGGGGGTA 2151CGTGGAGACC CTCTTCGGCC GCAGGCGCTA CGTGCCCGAC CTCAACGCCC 2201GGGTAAAGAG CGTGCGGGAG GCGGCGGAGC GCATGGCCTT TAACATGCCC 2251GTGCAGGGCA CCGCCGCTGA CCTGATGAAG CTCGCCATGG TGAGGCTCTT 2301CCCTAGGCTT CCCCAGGTGG GGGCGAGGAT GCTCCTCCAG GTCCACGACG 2351AGCTCCTCCT GGAGGCGCCC AAGGAGCGGG CGGAGGAGGC GGCGGCCCTG 2401GCCAAGGAGG TCATGGAGGG AGTCTGGCCC CTGGCCGTGC CCCTGGAGGT 2451GGAGGTGGGC ATCGGGGAGG ACTGGCTTTC CGCCAAGGGC TAGTCGAC

The substitution of TAC (encoding Tyr) for TTC (encoding Phe) at theindicated positions can reduce discrimination against ddNTPincorporation by DNA polymerase I. See, e.g., U.S. Pat. No. 5,614,365,which is incorporated herein by reference. The substitution of GAC(encoding Asp) for GGG (encoding Gly) at the indicated positions removesthe 5′-3′ exonuclease activity.

The nucleic acids of the invention have homology to portions of the DNAsequences encoding the thermostable DNA polymerases of Thermus aquaticusand Thermus thermophilus (see FIG. 1). However, significant portions ofthe nucleic acid sequences of the present invention are distinct.

The invention also encompasses fragment and variant nucleic acids of SEQID NO:1-8. Nucleic acid “fragments” encompassed by the invention are oftwo general types. First, fragment nucleic acids that do not encode afull length DMA polymerase but do encode a thermally stable polypeptidewith DNA polymerase activity are encompassed within the invention.Second, fragment nucleic acids useful as hybridization probes but thatgenerally do not encode polymerases retaining biological activity arealso encompassed within the invention. Thus, fragments of nucleotidesequences such as SEQ ID NO:1-8 may be as small as about 9 nucleotides,about 12 nucleotides, about 15 nucleotides, about 17 nucleotides, about18 nucleotides, about 20 nucleotides, about 50 nucleotides, about 100nucleotides or more. In general, a fragment nucleic acid of theinvention can have any upper size limit so long as it is related insequence to the nucleic acids of the invention but is not full length.

As indicated above, “variants” are substantially similar orsubstantially homologous sequences. For nucleotide sequences, variantsinclude those sequences that, because of the degeneracy of the geneticcode, encode the identical amino acid sequence of the native DNApolymerase I protein. Variant nucleic acids also include those thatencode polypeptides that do not have amino acid sequences identical tothat of a native DNA polymerase I protein, but that encode an active,thermally stable DNA polymerase I with conservative changes in the aminoacid sequence.

As is known by one of skill in the art, the genetic code is“degenerate,” meaning that several trinucleotide codons can encode thesame amino acid. This degeneracy is apparent from Table 1.

TABLE 1 1^(st) 3^(rd) Posi- Second Position Posi- tion T C A G tion TTTT = Phe TCT = Ser TAT = Tyr TGT = Cys T T TTC = Phe TCC = Ser TAC =Tyr TGC = Cys C T TTA = Leu TCA = Ser TAA = Stop TGA = Stop A T TTG =Leu TCG = Ser TAG = Stop TGG = Trp G C CTT = Leu CCT = Pro CAT = His CGT= Arg T C CTC = Leu CCC = Pro CAC = His CGC = Arg C C CTA = Leu CCA =Pro CAA = Gln CGA = Arg A C CTG = Leu CCG = Pro CAG = Gln CGG = Arg G AATT = Ile ACT = Thr AAT = Asn AGT = Ser T A ATC = Ile ACC = Thr AAC =Asn AGC = Ser C A ATA = Ile ACA = Thr AAA = Lys AGA = Arg A A ATG = MetACG = Thr AAG = Lys AGG = Arg G G GTT = Val GCT = Ala GAT = Asp GGT =Gly T G GTC = Val GCC = Ala GAC = Asp GGC = Gly C G GTA = Val GCA = AlaGAA = Gln GGA = Gly A G GTG = Val GCG = Ala GAG = Gln GGG = Gly GHence, many changes in the nucleotide sequence of the variant may besilent and may not alter the amino acid sequence encoded by the nucleicacid. Where nucleic acid sequence alterations are silent, a variantnucleic acid will encode a polypeptide with the same amino acid sequenceas the reference nucleic acid. Therefore, a particular nucleic acidsequence of the invention also encompasses variants with degeneratecodon substitutions, and complementary sequences thereof, as well as thesequence explicitly specified by a SEQ ID NO. Specifically, degeneratecodon substitutions may be achieved by generating sequences in which thereference codon is replaced by any of the codons for the amino acidspecified by the reference codon. In general, the third position of oneor more selected codons can be substituted with mixed-base and/ordeoxyinosine residues as disclosed by Batzer et al., Nucleic Acid Res.,19, 5081 (1991) and/or Ohtsuka et al., J. Biol. Chem., 260,2605 (1985);Rossolini et al., Mol. Cell. Probes, 8, 91 (1994).

However, the invention is not limited to silent changes in the presentnucleotide sequences but also includes variant nucleic acid sequencesthat conservatively alter the amino acid sequence of a polypeptide ofthe invention. According to the present invention, variant and referencenucleic acids of the invention may differ in the encoded amino acidsequence by one or more substitutions, additions, insertions, deletions,fusions and truncations, which may be present in any combination, solong as an active, thermally stable DNA polymerase is encoded by thevariant nucleic acid. Such variant nucleic acids will not encode exactlythe same amino acid sequence as the reference nucleic acid, hut haveconservative sequence changes.

Variant nucleic acids with silent and conservative changes can bedefined and characterized by the degree of homology to the referencenucleic acid. Preferred variant nucleic acids are “substantiallyhomologous” to the reference nucleic acids of the invention. Asrecognized by one of skill in the art, such substantially similarnucleic acids can hybridize under stringent conditions with thereference nucleic acids identified by SEQ ID NOs herein. These types ofsubstantially homologous nucleic acids are encompassed by thisinvention.

Generally, nucleic acid derivatives and variants of the invention willhave at least 90%, 91%, 92%, 93% or 94% sequence identity to thereference nucleotide sequence defined herein. Preferably, nucleic acidsof the invention will have at least at least 95%, 96%, 97%, 98%, or 99%sequence identity to the reference nucleotide sequence defined herein.

Variant nucleic acids can be detected and isolated by standardhybridization procedures.

Hybridization to detect or isolate such sequences is generally carriedout under stringent conditions, “Stringent hybridization conditions” and“stringent hybridization wash conditions” in the context of nucleic acidhybridization experiments such as Southern and Northern hybridizationare sequence dependent, and are different under different environmentalparameters. Longer sequences hybridize specifically at highertemperatures. An extensive guide to the hybridization of nucleic acidsis found in Tijssen, Laboratory Techniques in Biochemistry and Molecularbiology-Hybridization with Nucleic Acid Probes, page 1, chapter 2“Overview of principles of hybridization and the strategy of nucleicacid probe assays” Elsevier, N.Y. (1993). See also, J. Sambrook et al.Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, N.Y.,pp 9.31-9.58 (1989); J. Sambrook et al., Molecular Cloning: A LaboratoryManual Cold Spring Harbor Press, N.Y. (3rd ed. 2001).

The invention also provides methods for detection and isolation ofderivative or variant nucleic acids encoding DNA polymerase I activity.The methods involve hybridizing at least a portion of a nucleic acidcomprising SEQ ID NO:1, 2, 3, 4, 5, 6, 7 or 8 to a sample nucleic acid,thereby forming a hybridization complex; and detecting the hybridizationcomplex. The presence of the complex correlates with the presence of aderivative or variant nucleic acid encoding at least a segment of DNApolymerase I. In general, the portion of a nucleic acid comprising SEQID NO: 1, 2, 3, 4, 5, 6, 7 or 8 used for hybridization is at leastfifteen nucleotides, and hybridization is under hybridization conditionsthat are sufficiently stringent to permit detection and isolation ofsubstantially homologous nucleic acids. In an alternative embodiment, anucleic acid sample is amplified by the polymerase chain reaction usingprimer oligonucleotides selected from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7 or8.

Generally, highly stringent hybridization and wash conditions areselected to be about 5° C. lower than the thermal melting point for thespecific double-stranded sequence at a defined ionic strength and pH.For example, under “highly stringent conditions” or “highly stringenthybridization conditions” a nucleic acid will hybridize to itscomplement to a detectably greater degree than to other sequences (e.g.,at least 2-fold over background). By controlling the stringency of thehybridization and/or washing conditions, nucleic acids that are 100%complementary can be identified.

Alternatively, stringency conditions can be adjusted to allow somemismatching in sequences so that lower degrees of similarity aredetected (heterologous probing). Typically, stringent conditions will bethose in which the salt concentration is less than about 1.5 M Na ion,typically about 0.01 to 1.0 M Na ion concentration (or other salts) atpH 7.0 to 8.3 and the temperature is at least about 30° C. for shortprobes (e.g., 10 to 50 nucleotides) and at least about 60° C. for longprobes (e.g., greater than 50 nucleotides). Stringent conditions mayalso be achieved with the addition of destabilizing agents such asformamide.

Exemplary low stringency conditions include hybridization with a buffersolution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecylsulphate) at 37° C. and a wash in 1× to 2×SSC (20×SSC=3.0 M NaCl and 0.3M trisodium citrate) at 50 to 55° C. Exemplary moderate stringencyconditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1%SDS at 37° C., and a wash in 0.5× to 1×SSC at 55 to 60° C. Exemplaryhigh stringency conditions include hybridization in 50% formamide, 1 MNaCl, 1% SDS at 37° C., and a wash in 0.1×SSC at 60 to 65° C.

The degree of complementarity or homology of hybrids obtained duringhybridization is typically a function of post-hybridization washes, thecritical factors being the ionic strength and temperature of the finalwash solution. The type and length of hybridizing nucleic acids alsoaffects whether hybridization will occur and whether any hybrids formedwill be stable under a given set of hybridization and wash conditions.For DNA-DNA hybrids, the T_(m) can be approximated from the equation ofMeinkoth and Wahl Anal. Biochem. 138:267-284 (1984); T_(m) 81.5° C.+16.6(log M)+0.41 (% GC)−0.61 (% form)−500/L; where M is the molarity ofmonovalent cations, % GC is the percentage of guanosine and cytokinenucleotides in the DNA, % form is the percentage of formamide in thehybridization solution, and L is the length of the hybrid in base pairs.The T_(m) is the temperature (under defined ionic strength and pH) atwhich 50% of a complementary target sequence hybridizes to a perfectlymatched probe.

Very stringent conditions are selected to be equal to the Tm for aparticular probe.

An example of stringent hybridization conditions for hybridization ofcomplementary nucleic acids that have more than 100 complementaryresidues on a filter in a Southern or Northern blot is 50% formamidewith 1 mg of heparin at 42° C., with the hybridization being carried outovernight. An example of highly stringent conditions is 0.1 5 M NaCl at72° C. for about 1.5 minutes. An example of stringent wash conditions isa 0.2×SSC wash at 65° C. for 15 minutes (see also, Sambrook, infra).Often, a high stringency wash is preceded by a low stringency wash toremove background probe signal. An example of medium stringency for aduplex of, e.g., more than 100 nucleotides, is 1×SSC at 45° C. for 15minutes. An example low stringency wash for a duplex of, e.g., more than100 nucleotides, is 4-6×SSC at 40° C. for 15 minutes. For short probes(e.g., about 10 to 50 nucleotides), stringent conditions typicallyinvolve salt concentrations of less than about 1.0M Na ion, typicallyabout 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to8.3, and the temperature is typically at least about 30° C.

Stringent conditions can also be achieved with the addition ofdestabilizing agents such as formamide. In general, a signal to noiseratio of 2× (or higher) than that observed for an unrelated probe in theparticular hybridization assay indicates detection of a specifichybridization. Nucleic acids that do not hybridize to each other tinderstringent conditions are still substantially identical if the proteinsthat they encode are substantially identical. This occurs, e.g., when acopy of a nucleic acid is created using the maximum codon degeneracypermitted by the genetic code.

The following are examples of sets of hybridization/wash conditions thatmay be used to detect and isolate homologous nucleic acids that aresubstantially identical to reference nucleic acids of the presentinvention: a reference nucleotide sequence preferably hybridizes to thereference nucleotide sequence in 7% sodium dodecyl sulfate (SOS), 0.5 MNaPO₄, 1 mM EDTA at 50° C. with washing in 2×SSC, 0.1% SDS at 50° C.,more desirably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO₄, 1 mMEDTA at 50° C. with washing in 1×SSC, 0.1% SDS at 50° C., more desirablystill in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO₄, 1 mM EDTA at 50°C. with washing in 0.5×SSC, 0.1% SDS at 50° C., preferably in 1% sodiumdodecyl sulfate (SDS), 0.5 M NaPO₄, 1 mM EDTA at 50° C. with washing in0.1×SSC, 0.1% SDS at 50° C., more preferably in 7% sodium dodecylsulfate (SDS), 0.5 M NaPO₄, 1 mM EDTA at 50° C. with washing in 0.1×SSC,0.1% SDS at 65° C.

In general, T_(m) is reduced by about 1° C. for each 1% of mismatching.Thus, T_(m) hybridization, and/or wash conditions can be adjusted tohybridize to sequences of the desired sequence identity. For example, ifsequences with >90% identity are sought, the Tm can be decreased 10° C.Generally, stringent conditions are selected to be about 5° C. lowerthan the thermal melting point (T_(m)) for the specific sequence and itscomplement at a defined ionic strength and pH. However, severelystringent conditions can utilize a hybridization and/or wash at 1, 2, 3,or 4° C. lower than the thermal melting point (T_(m)); moderatelystringent conditions can utilize a hybridization and/or wash at 6, 7, 8,9, or 10° C. lower than the thermal melting point (T_(m)); lowstringency conditions can utilize a hybridization and/or wash at 11, 12,13, 14, 15, or 20° C. lower than the thermal melting point (T_(m)).

If the desired degree of mismatching results in a T_(m) of less than 45°C. (aqueous solution) or 32° C. (form amide solution), it is preferredto increase the SSC concentration so that a higher temperature can beused. An extensive guide to the hybridization of nucleic acids is foundin Tijssen (1993) Laboratory Techniques in Biochemistry and MolecularBiology-Hybridization with Nucleic Acid Probes, Part 1, Chapter 2(Elsevier, New York), and Ausubel et al., eds. (1995) Current Protocolsin Molecular Biology, Chapter 2 (Greene Publishing andWiley-Interscience, New York). See Sambrook et al. (1989) MolecularCloning: A Laboratory Manual (2d ed., Cold Spring Harbor LaboratoryPress, Plainview, N.Y.). Using these references and the teachings hereinon the relationship between T_(m), mismatch, and hybridization and washconditions, those of ordinary skill can generate variants of the presentDNA polymerase I nucleic acids.

Computer analyses can also be utilized for comparison of sequences todetermine sequence identity. Such analyses include, but are not limitedto: CLUSTAL in the PC/Gene program (available from Intelligenetics,Mountain View, Calif.); the ALIGN program (Version 2.0) and GAP,BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics SoftwarePackage, Version 8 (available from Genetics Computer Group (GCG), 575Science Drive, Madison, Wis., USA). Alignments using these programs canbe performed using the default parameters. The CLUSTAL program is welldescribed by Higgins et al. Gene 73:237 344 (1988); Higgins et al.CABIOS 5:151-153 (1989); Corpet et al. Nucleic Acids Res. 16:10881-90(1988); Huang et al. CABIOS 8:155-65 (1992); and Pearson et al. Meth.Mol. Biol. 24:307-331 (1994). The ALIGN program is based on thealgorithm of Myers and Miller, supra. The BLAST programs of Altschul etal., J. Mol. Biol. 215:403 (1990), are based on the algorithm of Karlinand Altschul supra. To obtain gapped alignments for comparison purposes,Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul etal. Nucleic Acids Res. 25:3389 (1997). Alternatively, PSI-BLAST (inBLAST 2.0) can be used to perform an iterated search that detectsdistant relationships between molecules. See Altschul et al., supra,When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default parameters ofthe respective programs (e.g. BLASTN for nucleotide sequences, BLASTXfor proteins) can be used. The BLASTN program (for nucleotide sequences)uses as defaults a wordlength (W) of 11, an expectation (E) of 10, acutoff of 100, M=5, N=−4, and a comparison of both strands. For aminoacid sequences, the BLASTP program uses as defaults a wordlength (W) of3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (seeHenikoff & Henikoff, Proc. Natl. Acad. Sci. USA, 89, 10915 (1989)). Seehttp://www.ncbi.nlm.nih.gov. Alignment may also be performed manually byinspection.

For purposes of the present invention, comparison of nucleotidesequences for determination of percent sequence identity to the DNApolymerase sequences disclosed herein is preferably made using theBlastN program (version 1.4.7 or later) with its default parameters orany equivalent program. By “equivalent program” is intended any sequencecomparison program that, for any two sequences in question, generates analignment having identical nucleotide or amino acid residue matches andan identical percent sequence identity when compared to thecorresponding alignment generated by the preferred program.

Expression of Polymerase Nucleic Acids

Nucleic acids of the invention may be used for the recombinantexpression of the polymerase polypeptides of the invention. Generally,recombinant expression of a polymerase polypeptide of the invention iseffected by introducing a nucleic acid encoding that polypeptide into anexpression vector adapted for use in particular type of host cell. Thenucleic acids of the invention can be introduced and expressed in anyhost organism, for example, in both prokaryotic or eukaryotic hostcells. Examples of host cells include bacterial cells, yeast cells,cultured insect cell lines, and cultured mammalian cells lines.Preferably, the recombinant host cell system is selected that processesand post-translationally modifies nascent polypeptides in a mannersimilar to that of the organism from which the polymerase was derived.For purposes of expressing and isolating polymerase polypeptides of theinvention, prokaryotic organisms are preferred, for example, Escherichiacoli. Accordingly, the invention provides host cells comprising theexpression vectors of the invention.

The nucleic acids to be introduced can be conveniently placed inexpression cassettes for expression in an organism of interest. Suchexpression cassettes will comprise a transcriptional initiation regionlinked to a nucleic acid of the invention. Expression cassettespreferably also have a plurality of restriction sites for insertion ofthe nucleic acid to be under the transcriptional regulation of variouscontrol elements. The expression cassette additionally may containselectable marker genes. Suitable control elements such asenhancers/promoters, splice junctions, polyadenylation signals, etc. maybe placed in close proximity to the coding region of the gene if neededto permit proper initiation of transcription and/or correct processingof the primary RNA transcript. Alternatively, the coding region utilizedin the expression vectors of the present invention may containendogenous enhancers/promoters, splice junctions, intervening sequences,polyadenylation signals, etc., or a combination of both endogenous andexogenous control elements.

Preferably the nucleic acid in the vector is under the control of, andoperably linked to, an appropriate promoter or other regulatory elementsfor transcription in a host cell. The vector may be a bi-functionalexpression vector that functions in multiple hosts. The transcriptionalcassette generally includes in the 5′-3′ direction of transcription, apromoter, a transcriptional and translational initiation region, a DNAsequence of interest, and a transcriptional and translationaltermination region functional in the organism. The termination regionmay be native with the transcriptional initiation region, may be nativewith the DNA sequence of interest, or may be derived from anothersource.

Efficient expression of recombinant DNA sequences in prokaryotic andeukaryotic ceils generally requires regulatory control elementsdirecting the efficient termination and polyadenylation of the resultingtranscript. Transcription termination signals are generally founddownstream of the polyadenylation signal and are a few hundrednucleotides in length. The term “poly A site” or “poly A sequence” asused herein denotes a DNA sequence that directs both the termination andpolyadenylation of the nascent RNA transcript. Efficient polyadenylationof the recombinant transcript is desirable as transcripts lacking a polyA tail are unstable and are rapidly degraded.

Nucleic acids encoding DNA polymerase I may be introduced into bacterialhost cells by a method known to one of skill in the art. For example,nucleic acids encoding a thermophilic DNA polymerase I can be introducedinto bacterial cells by commonly used transformation procedures such asby treatment with calcium chloride or by electroporation. If thethermophilic DNA polymerase I is to be expressed in eukaryotic hostcells, nucleic acids encoding the thermophilic DNA polymerase I may beintroduced into eukaryotic host cells by a number of means includingcalcium phosphate co-precipitation, spheroplast fusion, electroporationand the like. When the eukaryotic host cell is a yeast cell,transformation may be affected by treatment of the host cells withlithium acetate or by electroporation.

Thus, one aspect of the invention is to provide expression vectors andhost ceils comprising a nucleic acid encoding a DNA polymerasepolypeptide of the invention. A range of expression vectors areavailable in the art. Description of various expression vectors and howto use them can be found among other places in U.S. Pat. Nos. 5,604,118;5,583,023; 5,432,082; 5,266,490; 5,063,158; 4,966,841; 4,806,472;4,801,537; and Goedel et al., Gene Expression Technology, Methods ofEnzymology, Vol. 185, Academic Press, San Diego (1989), The expressionof polymerases in recombinant cell, systems is an established technique.Examples of the recombinant expression of DNA polymerase can be found,in U.S. Pat. Nos. 5,602,756; 5,545,552; 5,541,311; 5,500,363; 5,489,523;5,455,170; 5,352,778; 5,322,785; and 4,935,361.

Recombinant DNA and molecular cloning techniques that can be used tohelp make and use aspects of the invention are described by Sambrook etal., Molecular Cloning: A Laboratory Manual Vol. 1-3, Cold Spring Harborlaboratory, Cold Spring Harbor, N.Y. (2001); Ausubel (ed.), CurrentProtocols in Molecular Biology, John Wiley and Sons, Inc. (1994); T.Maniatis, E. F. Fritsch and J. Sambrook, Molecular Cloning: A LaboratoryManual, Cold Spring Harbor laboratory, Cold Spring Harbor, N.Y. (1989);and by T. J. Silhavy, M. L. Berman, and L. W. Enquist, Experiments withGene Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.(1984).

Nucleic Acid Polymerases

The invention provides Thermus brockianus polymerase polypeptides, aswell as fragments thereof and variant polymerase polypeptides that areactive and thermally stable. Any polypeptide containing amino acidsequence SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ IDNO:13, SEQ ID NO:14, SEQ ID NO:15 and SEQ ID NO:16, which are the aminoacid sequences for wild type and derivative Thermus brockianuspolymerases, are contemplated by the present invention. The polypeptidesof the invention are isolated or substantially purified polypeptides. Inparticular, the isolated polypeptides of the invention are substantiallyfree of proteins normally present in Thermus brockianus bacteria.

In one embodiment, the invention provides a wild type Thermus brockianuspolymerase from strain YS38 having SEQ ID NO:9.

1 MLFLFEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VYGFAKSLLK 50 51ALKEDGDVVI VVFDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDFR ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVREKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAF LERLEFGSLL HEFGLLESPQ AAEEAPWPPP 300 301EQAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDLLPAL 550 551VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTINFGVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 829

In another embodiment, the invention provides a wild type Thermusbrockianus polymerase from strain 2AZN saving SEQ ID NO:10. The 2AZNamino acid sequence differs from the YS38 sequence by one amino acid—the2AZN strain has proline instead of leucine at position 546.

1 MLPLFEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VYGFAKSLLK 50 51ALKEDGDVVI VVFDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDPR ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVREKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAP LERLEFGSLL HEFGLLESPQ AAEEAPWPPP 300 301EGAPLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE ENFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAEPIVEKI LQYRELAKLK GTYID P LPAL 550 551VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTINFGVLYG MSAHRLSOEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 829

Significant portions of the Thermus brockianus polymerase sequences aredistinct from other polymerases, including, for example, a peptide atpositions 22-25 (RNFF, SEQ ID NO:9, a peptide at positions 39-42 (QGVY,SEQ ID NO:17), a peptide at positions 76-79 (GAYK, SEQ ID NO:18), apeptide at positions 95-98 (LMKE, SEQ ID NO:19), a peptide at positions111-114 (PGFE, SEQ ID NO:20), a peptide at positions 106-121(ERLEVPGFEADD VLAA, SEQ ID NO:21), a peptide at positions 161-164 (TPGW,SEQ ID NO:22), a peptide at positions 182-186 (LAGDP, SEQ ID NO:23), apeptide at positions 213-216 (NIQK, SEQ ID NO:24), a peptide atpositions 220-224 (QVSPP, SEQ ID NO:25), a peptide at positions 228-231(EKIQ, SEQ ID NO:26), a peptide at positions 238-242 (RLSQE, SEQ IDNO:27), a peptide at positions 256-261 (FRRRRE, SEQ ID NO:28), a peptideat positions 288-292 (SPQAA, SEQ ID NO:29), a peptide at positions305-308 (LGFR, SEQ ID NO:30), a peptide at positions 318-321 (ELLS, SEQID NO:31), a peptide at positions 325-331 (SAKGRVY, SEQ ID NO:32), apeptide at positions 334-337 (EAPH, SEQ ID NO:33), a peptide atpositions 334-341 (EAPHKALS, SEQ ID NO:34), a peptide at positions407-412 (AERLYE, SEQ ID NO:35), a peptide at positions 415-419 (LSRLK,SEQ ID NO:36), a peptide at positions 428-431 (YEEV, SEQ ID NO:37), apeptide at positions 465-468 (MGRL, SEQ ID NO:38), a peptide atpositions 537-541 (AKLKG, SEQ ID NO:39), a peptide at positions 545-549(LPAL, SEQ ID NO:40), a peptide at positions 600-603 (EGYL, SEQ IDNO:41), a peptide at positions 647-650 (LPAE, SEQ ID NO:42), a peptideat positions 648-652 (PAEAI, SEQ ID NO:43), a peptide at positions655-658 (LRRR, SEQ ID NO:44), a peptide at positions 690-693 (IDRY, SEQID NO:45), a peptide at positions 698-702 (YPKVK, SEQ ID NO:46), apeptide at positions 712-715 (GRQR, SEQ ID NO:47), a peptide atpositions 765-773 (RLFPRLPEV, SEQ ID NO:48) and a peptide at positions807-810 (GVWP, SEQ ID NO:49).

Many DNA polymerases possess activities in addition to a DNA polymeraseactivity. Such activities include, for example, a 5′-3′ exonucleaseactivity and/or a 3′-5′ exonuclease activity. The 3′-5′ exonucleaseactivity improves the accuracy of the newly-synthesized strand byremoving incorrect bases that may have been incorporated. DNApolymerases in which such activity is low or absent are prone to errorsin the incorporation of nucleotide residues into the primer extensionstrand. Taq DNA polymerase has been reported to have low 3′-5′exonuclease activity. See Lawyer et al., J. Biol Chem. 264:6427-6437. Inapplications such as nucleic acid amplification procedures in which thereplication of DNA is often geometric in relation to the number ofprimer extension cycles, such errors can lead to serious artifactualproblems such as sequence heterogeneity of the nucleic acidamplification product (amplicon). Thus, a 3′-5′ exonuclease activity isa desired characteristic of a thermostable DNA polymerase used for suchpurposes.

By contrast, the 5′-3′ exonuclease activity of DNA polymerase enzymes isoften undesirable because this activity may digest nucleic acids,including primers, which have an unprotected 5′ end. Thus, athermostable polymerase with an attenuated 5′-3′ exonuclease activity,or in which such activity is absent, is a desired characteristic of anenzyme for biochemical applications. Various DNA polymerase enzymes havebeen described where a modification has been introduced in a DNApolymerase that accomplishes this object. For example, the Klenowfragment of E. coli DNA polymerase I can be produced as a proteolyticfragment of the holoenzyme in which the domain of the proteincontrolling the 5′-3′ exonuclease activity has been removed. The Klenowfragment still retains the polymerase activity and the 3′-5′ exonucleaseactivity. Barnes, PCT Publication No. WO92/06188 (1992) and Gelfand etal, U.S. Pat. No. 5,466,591 have produced 5′-3′ exonuclease-deficientrecombinant Thermus aquaticus DNA polymerases. Ishino et al., EPOPublication No. 0517418A2, have produced a 5′-3′ exonuclease-deficientDNA polymerase derived from Bacillus caldotenax.

In another embodiment, the invention provides a polypeptide that is aderivative Thermus brockianus polypeptide with reduced or eliminated5′-3′ exonuclease activity. Several methods exist for reducing thisactivity, and the invention contemplates any polypeptide derived fromthe Thermus brockianus polypeptides of the invention that has reduced oreliminated such 5′-3′ exonuclease activity. Xu et al, Biochemical andmutational studies of the 5′-3′ exonuclease of DNA polymerase I ofEscherichia coli, J. Mol. Biol. 1997 May 2; 268(2):284-302. In oneembodiment, Asp is used in place of Gly at position 43 to produce apolypeptide with reduced 5′-3′ exonuclease activity.

Hence, the invention provides a derivative polypeptide having SEQ IDNO:11 that is related to a Thermus brockianus polymerase polypeptidefrom strain YS38, wherein Asp is used in place of Gly at position 43.

1 MLPLPEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VY D PAKSLLK 50 51ALKEDGDVVI VVFDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDFR ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVREKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAF LERLEFGSLL HEFGLLESPQ AAEEAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RVGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501TGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDLLPAL 550 501VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLPRRAA KTINFGVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 829

The invention also provides a derivative polypeptide having SEQ ID NO:12that is related to a Thermus brockianus polymerase polypeptide fromstrain 2AZN, wherein Asp is used in place of Gly at position 43.

1 MLPLFEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VY D FAKSLLK 50 51ALKEDGDVVI VVPDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDER ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVREKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAF LERLEFGSLL HEFGLLESPQ AAEEAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDPLPAL 550 551VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTINFGVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELELEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG TGEDWLSAKG 829

In another embodiments the invention provides a derivative polypeptidehaving SEQ ID NO:13 that is related to the Thermus brockianus polymerasepolypeptide from strain YS38, and that has Tyr in place of Phe atposition 665. This derivative polypeptide has reduced bias against ddNTPincorporation. The sequence of SEQ ID NO:13 is below.

1 MLPLFEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VYGFAKSLLK 50 51ALKEDGDVVI VVFDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDFR ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVRSKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAF LERLEFGSLL HEFGLLESPQ AAEEAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLL5RLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDLLPAL 550 501VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRNLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTIN Y GVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEWLSAKG 829

In another embodiment, the invention provides a derivative polypeptidehaving SEQ ID NO:14 that is related to the Thermus brockianus polymerasepolypeptide from strain 2AZN and that has Tyr in place of Phe atposition 665. This derivative polypeptide has reduced bias against ddNTPincorporation. The sequence of SEQ ID NO:14 is below.

1 MLPLFEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VYGFAKSLLK 50 51ALKEDGDVVI VVFDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDFR ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVREKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAF LERLEFGSLL HEFGLLESPQ AAEEAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDPLPAL 550 551VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTIN Y GVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 829

In another embodiment, the invention provides a derivative polypeptidehaving SEQ ID NO:15, related to a Thermus brockianus polypeptide fromstrain YS38 with reduced 5′-3′ exonuclease activity and reduced biasagainst ddNTP incorporation. SEQ ID NO: 15 has Asp in place of Gly atposition 43 and Tyr in place of Phe at position 665. The sequence of SEQID NO:15 is below.

1 MLPLFEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VY D FAKSLLK 50 51ALKEDGDVVI VVFDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDFR ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVREKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAF LERLEFGSLL HEFGLLESPQ AAEEAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVPRLAGHPF NLNSRDQLER VLFDSLGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDLLPAL 550 501VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTIN Y GVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 839

In another embodiment, the invention provides a derivative polypeptidehaving SEQ ID NO:15, related to a Thermus brockianus polypeptide fromstrain YS3S with reduced 5′-3′ exonuclease activity and reduced biasagainst ddNTP incorporation. SEQ ID NO: 15 has Asp in place of Gly atposition 43 and Tyr in place of Phe at position 665. The sequence of SEQID NO:15 is below.

1 MLPLFEPKGR VLLVDGHHLA YRNFFALKGL TTSRGEPVQG VY D FAKSLLK 50 51ALKEDGDVVI VVFDAKAPSF RHEAYGAYKA GRAPTPEDFP RQLALMKELV 100 101DLLGLERLEV PGFEADDVLA ALAKKAEREG YEVRILTADR DLFQLLSDRI 150 151AVLHPEGHLI TPGWLWERYG LRPEQWVDFR ALAGDPSDNI PGVKGIGEKT 200 201ALKLLKEWGS LENIQKNLDQ VSPPSVREKI QAHLDDLRLS QELSRVRTDL 250 251PLEVDFRRRR EPDREGLRAF LERLEFGSLL HEFGLLESPQ AAETAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDPLPAL 550 551VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTIN Y GVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 829

The DNA polymerase polypeptides of the invention have homology toportions of the amino acid sequences of the thermostable DNA polymerasesof Thermus aquaticus and Thermus thermophilus (see FIG. 1). However,significant portions of the amino acid sequences of the presentinvention are distinct, including SEQ ID NO:17-49.

As indicated above, derivative and variant polypeptides of the inventionare derived from the wild type Thermus brockianus polymerases bydeletion or addition of one or more amino acids to the N-terminal and/orC-terminal end of the wild type polypeptide; deletion or addition of oneor more amino acids at one or more sites within the wild typepolypeptide; or substitution of one or more amino acids at one or moresites within the wild type polypeptide. Thus, the polypeptides of theinvention may be altered in various ways including amino acidsubstitutions, deletions, truncations, and insertions.

Such variant and derivative polypeptides may result, for example, fromgenetic polymorphism or from human manipulation. Methods for suchmanipulations are generally known in the art. For example, amino acidsequence variants of the polypeptides can be prepared by mutations inthe DNA. Methods for mutagenesis and nucleotide sequence alterations arewell known in the art. See, for example, Kunkel, Proc. Natl. Acad. Sci.USA, 82,488 (1985); Kunkel et al., Methods in Enzymol., 154, 367 (1987);U.S. Pat. No. 4,873,192; Walker and Gaastra, eds., Techniques inMolecular Biology, MacMillan Publishing Company, New York (1983) and thereferences cited therein. Guidance as to appropriate amino acidsubstitutions that do not affect biological activity of the protein ofinterest may be found in the model of Dayhoff et al., Atlas of ProteinSequence and Structure, Natl. Biomed. Res. Found., Washington, D. C.(1978), herein incorporated by reference.

The derivatives and variants of the isolated polypeptides of theinvention have identity with at least about 92% of the amino acidpositions of SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, or SEQ ID NO:8 andhave DMA polymerase I activity and/or are thermally stable. In apreferred embodiment, polypeptide derivatives and variants have identitywith at least about 95% of the amino acid positions of SEQ ID NQ:5, SEQID NO:6, SEQ ID NO:7, or SEQ ID NO:8 and have DNA polymerase I activityand/or are thermally stable.

Amino acid residues of the isolated polypeptides and polypeptidederivatives and variants can be genetically encoded L-amino acids,naturally occurring non-genetically encoded L-amino acids, syntheticL-amino acids or D-enantiomers of any of the above. The amino acidnotations used herein for the twenty genetically encoded L-amino acidsand common non-encoded amino acids are conventional and are as shown inTable 2.

TABLE 2 One-Letter Common Amino Acid Symbol Abbreviation Alanine A AlaArginine R Arg Asparagine N Asn Aspartic acid D Asp Cysteine C CysGlutamine Q Gln Glutamic acid E Glu Glycine G Gly Histidine H HisIsoleucine I Ile Leucine L Leu Lysine K Lys Methionine M MetPhenylalanine F Phe Proline P Pro Serine S Ser Threonine T ThrTryptophan W Trp Tyrosine Y Tyr Valine V Val β-Alanine bAla2,3-Diaminopropionic acid Dpr α-Aminoisobutyric acid Aib N-Methylglycine(sarcosine) MeGly Ornithine Orn Citrulline Cit t-Butylalanine t-BuAt-Butylglycine t-BuG M-methylisoleucine MeIle Phenylglycine PhgCyclohexylalanine Cha Norleucine Nle Naphthylalanine Nal Pyridylalanine3-Benzothienyl alanine 4-Chlorophenylalanine Phe(4-Cl)2-Fluorophenylalanine Phe(2-F) 3-Fluorophenylalanine Phe(3-F)4-Fluorophenylalanine Phe(4-F) Penicillamine Pen 1,2,3,4-Tetrahydro- Ticisoquinoline-3-carboxylic acid β-2-thienylalanine Thi Methioninesulfoxide MSO Homoarginine hArg N-acetyl lysine AcLys 2,4-Diaminobutyric acid Dbu ρ-Aminophenylalanine Phe(pNH₂) N-methylvaline MeValHomocysteine hCys Homoserine hSer ε-Amino hexanoic acid Aha δ-Aminovaleric acid Ava 2,3-Diaminobutyric acid Dab

Polypeptide variants that are encompassed within the scope of theinvention can have one or more amino acids substituted with an aminoacid of similar chemical and/or physical properties, so long as thesevaliant polypeptides retain nucleic acid polymerase or DNA polymeraseactivity and/or remain thermally stable. Derivative polypeptides canhave one or more amino acids substituted with an amino acids havingdifferent chemical and/or physical properties, so long as these variantpolypeptides retain nucleic acid polymerase or DNA polymerase activityand/or remain thermally stable.

Amino acids that are substitutable for each other in the present variantpolypeptides generally reside within similar classes or subclasses. Asknown to one of skill in the art, amino acids can be placed into threemain classes: hydrophilic amino acids, hydrophobic amino acids andcysteine-like amino acids, depending primarily on the characteristics ofthe amino acid side chain. These main classes may be further dividedinto subclasses. Hydrophilic amino acids include amino acids havingacidic, basic or polar side chains and hydrophobic amino acids include,amino acids having aromatic or apolar side chains. Apolar amino acidsmay be further subdivided to include, among others, aliphatic aminoacids. The definitions of the classes of amino acids as used herein areas follows:

“Hydrophobic Amino Acid” refers to an amino acid having a side chainthat is uncharged at physiological pH and that is repelled by aqueoussolution. Examples of genetically encoded hydrophobic amino acidsinclude Ile, Leu and Val. Examples of non-genetically encodedhydrophobic amino acids include t-BuA.

“Aromatic Amino Acid” refers to a hydrophobic amino acid having a sidechain containing at least one ring having a conjugated π-electron system(aromatic group). The aromatic group may be further substituted withsubstituent groups such as alkyl, alkenyl, alkynyl, hydroxyl, sulfonyl,nitro and amino groups, as well as others. Examples of geneticallyencoded aromatic amino acids include phenylalanine, tyrosine andtryptophan. Commonly encountered non-genetically encoded aromatic aminoacids include phenyl glycine, 2-naphthylalanine, β-2-thienylalanine,1,2,3,4-tetrahydroisoquinoline-3-carboxylic acid, 4-chlorophenylalanine,2-fluorophenylalanine, 3-fluorophenylalanine and 4-fluorophenylalanine.

“Apolar Amino Acid” refers to a hydrophobic amino acid having a sidechain that is generally uncharged at physiological pH and that is notpolar. Examples of genetically encoded apolar amino acids includeglycine, proline and methionine. Examples of non-encoded apolar aminoacids include Cha.

“Aliphatic Amino Acid” refers to an apolar amino acid having a saturatedor unsaturated straight chain, branched or cyclic hydrocarbon sidechain. Examples of genetically encoded aliphatic amino acids includeAla, Leu, Val and Ile. Examples of non-encoded aliphatic amino acidsinclude Nle.

“Hydrophilic Amino Acid” refers to an amino acid having a side chainthat is attracted by aqueous solution. Examples of genetically encodedhydrophilic amino acids include Ser and Lys. Examples of non-encodedhydrophilic amino acids include Cit and hCys.

“Acidic Amino Acid” refers to a hydrophilic amino acid having a sidechain pK value of less than 7. Acidic amino acids typically havenegatively charged side chains at physiological pH due to loss of ahydrogen ion. Examples of genetically encoded acidic amino acids includeaspartic acid (aspartate) and glutamic acid (glutamate).

“Basic Amino Acid” refers to a hydrophilic amino acid having a sidechain pK value of greater than 7. Basic amino acids typically havepositively charged side chains at physiological pH due to associationwith hydronium ion. Examples of genetically encoded basic amino acidsinclude arginine, lysine and histidine. Examples of non-geneticallyencoded basic amino acids include the non-cyclic amino acids ornithine,2,3-diaminopropionic acid, 2,4-diaminobutyric acid and homoarginine.

“Polar Amino Acid” refers to a hydrophilic amino acid having a sidechain that is uncharged at physiological pH, but which has a bond inwhich the pair of electrons shared in common by two atoms is held moreclosely by one of the atoms. Examples of genetically encoded polar aminoacids include asparagine and glutamine. Examples of non-geneticallyencoded polar amino acids include citrulline, N-acetyl lysine andmethionine sulfoxide.

“Cysteine-Like Amino Acid” refers to an amino acid having a side chaincapable of forming a covalent linkage with a side chain of another aminoacid residue, such as a disulfide linkage. Typically, cysteine-likeamino acids generally have a side chain containing at least one thiol(SH) group. Examples of genetically encoded cysteine-like amino acidsinclude cysteine. Examples of non-genetically encoded cysteine-likeamino acids include homocysteine and penicillamine.

As will be appreciated by those having skill in the art, the aboveclassification are not absolute. Several amino acids exhibit more thanone characteristic property, and can therefore be included in more thanone category. For example, tyrosine has both an aromatic ring and apolar hydroxyl group. Thus, tyrosine has dual properties and can beincluded in both the aromatic and polar categories. Similarly, inaddition to being able to form disulfide linkages, cysteine also hasapolar character. Thus, while not strictly classified as a hydrophobicor apolar amino acid, in many instances cysteine can be used to conferhydrophobicity to a polypeptide.

Certain commonly encountered amino acids that are not geneticallyencoded and that can be present, or substituted for an amino acid, inthe variant polypeptides of the invention include, but are not limitedto, β-alanine (b-Ala) and other omega-amino acids such as3-aminopropionic acid (Dap), 2,3-diaminopropioinic acid (Dpr),4-aminobutyric acid and so forth; α-aminoisobutyric acid (Aib);ε-aminohexanoic acid (Aha); δ-aminovaleric acid (Ava); N-methylglycine(MeGly); ornithine (Orn); citrulline (Cit); t-butylalanine (t-BuA);t-butylglycine (t-BuG); N-methylisoleucine (Melle); phenylglycine (Phg);cyclohexylalanine (Cha); norleucine (Nle); 2-naphthylalanine (2-Nal);4-chlorophenylalanine (Phe(4-Cl)); 2-fluorophenylalanine (Phe(2-F));3-fluorophenylalanine (Phe(3-F)); 4-fluorophenylalanine (Phe(4-F));penicillamine (Pen); 1,2,3,4-tetrahydroisoquinoline-3H-carboxylic acid(Tic); .beta.-2-thienylalanine (Thi); methionine sulfoxide (MSO);homoarginine (hArg); N-acetyl lysine (AcLys); 2,3-diaminobutyric acid(Dab); 2,3-diaminobutyric acid (Dbu); p-aminophenylalanine (Phe(pNH₂));N-methyl valine (MeVal); homocysteine (hCys) and homoserine (hSer).These amino acids also fall into the categories defined above.

The classifications of the above-described genetically encoded andnon-encoded amino acids are summarized in Table 3, below. It is to beunderstood that Table 3 is for illustrative purposes only and does notpurport to be an exhaustive list of amino acid residues that maycomprise the variant and derivative polypeptides described herein. Otheramino acid residues that are useful for making the variant andderivative polypeptides described herein can be found, e.g., in Fasman,1989, CRC Practical Handbook of Biochemistry and Molecular Biology, CRCPress, Inc., and the references cited therein. Amino acids notspecifically mentioned herein can be conveniently classified into theabove-described categories on the basis of known behavior and/or theircharacteristic chemical and/or physical properties as compared withamino acids specifically identified.

TABLE 3 Genetically Classification Encoded Genetically Non-EncodedHydrophobic F, L, I, V Aromatic F, Y, W Phg, Nal, Thi, Tic, Phe(4-Cl),Phe(2-F), Phe(3-F), Phe(4-F), Pyridyl Ala, Benzothienyl Ala Apolar M, G,P Aliphatic A, V, L, I t-BuA, t-BuG, MeIle, Nle, MeVal, Cha, bAla,MeGly, Aib Hydrophilic S, K Cit, hCys Acidic D, E Basic H, K, R Dpr,Orn, hArg, Phe(p-NH₂), DBU, A₂ BU Polar Q, N, S, T, Y Cit, AcLys, MSO,hSer Cysteine-Like C Pen, hCys, β-methyl Cys

Polypeptides of the invention ears have any amino acid substituted byany similarly classified amino acid to create a variant peptide, so longas the peptide variant is thermally stable and/or retains nucleic acidpolymerase or DNA polymerase activity.

“Domain shuffling” or construction of “thermostable chimeric DNApolymerases” may be used to provide thermostable DNA polymerasescontaining novel properties, For example, placement of codons 289-422from the Thermus brockianus polymerase coding sequence after codons1-288 of the Thermus aquaticus DNA polymerase would yield a novelthermostable polymerase containing the 5′-3′ exonuclease domain ofThermus aquaticus DNA polymerase (1-288), the 3′-5′ exonuclease domainof Thermus brockianus polymerase (289-422), and the DNA polymerasedomain of Thermus aquaticus DNA polymerase (423-832). Alternatively, the5′-3′ exonuclease domain and the 3′ - 5′ exonuclease domain of Thermusbrockianus polymerase may be fused to the DNA polymerase (dNTP bindingand primer/template binding domains) portions of Thermus aquaticus DNApolymerase (about codons 423-832). The donors and recipients need not belimited to Thermus aquaticus and Thermus brockianus polymerases. Thermusthermophilus DNA polymerase 3′-5′ exonuclease, 5′-3′ exonuclease and DNApolymerase domains can similarly be exchanged for those in the Thermusbrockianus polymerases of the invention.

For example, it has been demonstrated that the exonuclease domain ofThermus aquaticus Polymerase I can be removed from the amino terminus ofthe protein with out a significant loss of thermostability or polymeraseactivity (Erlich et al., (1991) Science 252: 1643-1651, Barnes, W. M.,(1992) Gene 112:29-35., Lawyer et al., (1989) JBC 264:6427-6437). OtherN-terminal deletions similarly have been shown to maintainthermostability and activity (Vainshtein et al., (1996) Protein Science5:1785-1792 and references therein.) Therefore this invention

also includes similarly truncated forms of any of the wild type orvariant polymerases provided herein. For example, the invention is alsodirected to an active truncated variant of any of the polymerasesprovided by the invention in which the first 330 amino acids areremoved. Moreover, the invention provides SEQ ID NO:56, a truncated formof

a polymerase in which the N-terminal 289 amino acids have been removedfrom the wild type Thermus brockianus polymerase from strain YS38.

290                                           Q AAEEAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYIDLLPAL 550 551VHPRTGRLHT RPNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTINFGVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 829

In another embodiment, the invention provides SEQ ID NO:57, a truncatedform of a polymerase in which the N-terminal 289 amino acids have beenremoved from the Thermus brockianus polymerase from strain 2AZN.

290                                           Q AAEEAPWPPP 300 301EGAFLGFRLS RPEPMWAELL SLAASAKGRV YRAEAPHKAL SDLKEIRGLL 350 351AKDLAVLALR EGLGLPPTDD PMLLAYLLDP SNTTPEGVAR RYGGEWTEEA 400 401GERALLAERL YENLLSRLKG EEKLLWLYEE VEKPLSRVLA HMEATGVRLD 450 451VPYLRALSLE VAAEMGRLEE EVFRLAGHPF NLNSRDQLER VLFDELGLPP 500 501IGKTEKTGKR STSAAVLEAL REAHPIVEKI LQYRELAKLK GTYID P LPAL 550 551VHPRTGRLHT RFNQTATATG RLSSSDPNLQ NIPVRTPLGQ RIRRAFVAEE 600 601GYLLVALDYS QIELRVLAHL SGDENLIRVF QEGRDIHTQT ASWMFGLPAE 650 651AIDPLRRRAA KTINFGVLYG MSAHRLSQEL GIPYEEAVAF IDRYFQSYPK 700 701VKAWIERTLE EGRQRGYVET LFGRRRYVPD LNARVKSVRE AAERMAFNMP 750 751VQGTAADLMK LAMVRLFPRL PEVGARMLLQ VHDELLLEAP KERAEEAAAL 800 801AKEVMEGVWP LAVPLEVEVG IGEDWLSAKG 829

Thus, the polypeptides of the invention encompass both naturallyoccurring proteins as well as variations, truncations and modified formsthereof. Such variants will continue to possess the desired activity.The deletions, insertions, and substitutions of the polypeptide sequenceencompassed herein are not expected to produce radical changes in thecharacteristics of the polypeptide. One skilled in the art can readilyevaluate the thermal stability, nucleic acid polymerase or DNApolymerase activity of the polypeptides and variant polypeptides of theinvention by routine screening assays.

Kits and compositions containing the present polypeptides aresubstantially free of cellular material Such preparations andcompositions have less than about 30%, 20%, 10%, 5%, (by dry weight) ofcontaminating bacterial cellular protein.

The activity of polymerase polypeptides and variant polypeptides can beassessed by any procedure known to one of skill in the art. For example,the DNA synthetic activity of the variant and non-variant polymerasepolypeptides of the invention can be tested in standard DNA sequencingor DNA primer extension reaction. One such assay can be performed in a100 μl (final volume) reaction mixture, containing, for example, 0.1 mMdCTP, dTTP, dGTP, α-³²P-dATP, 0.3 mg/ml activated calf thymus DNA and0.5 mg/ml BSA in a buffer containing: 50 mM KCl, 1 mM DTT, 10 mM MgCl₂and 50 mM of a buffering compound such as PIPES, Tris or Triethyiamine.A dilution to 0.1 units/μl of each polymerase enzyme is prepared, and 5μl of such a dilution is added to the reaction mixture, followed byincubation at 60° C. for 10 minutes. Reaction products can be detectedby determining the amount of ³²P incorporated into DNA or by observingthe products after separation on a polyacrylamide gel.

Uses for Nucleic Acid Polymerases

The thermostable enzyme of this invention may be used for any purpose inwhich DNA or RNA polymerase enzyme activity is necessary or desired. Forexample, the present polymerases can he used in one or more of thefollowing procedures: DNA sequencing, DNA amplification, RNAamplification, reverse transcription, DNA synthesis and/or primerextension. The polymerases of the invention can be used to amplify DMAby polymerase chain reaction (PCR). The polymerases of the invention canbe used to sequence DNA by Sanger sequencing procedures. The polymerasesof the invention can also be used in primer extension reactions. Thepolymerases of the Invention can be used test for single nucleotidepolymorphisms (SNPs) by single nucleotide primer extension usingterminator nucleotides. Any such procedures and related procedures, forexample, polynucleotide or primer labeling, minisequencing and the likeare contemplated for use with the present polymerases.

Methods of the invention comprise the step of extending a primedpolynucleotide template with at least one labeled nucleotide, whereinthe extension is catalyzed by a polymerase of the invention, DMApolymerases used for Sanger sequencing can produce fluorescently labeledproducts that are analyzed on an automated fluorescence-based sequencingapparatus such as an Applied Biosystems 310 or 377 (Applied Biosystems,Foster City, Calif.). Detailed protocols for Sanger sequencing are knownto those skilled in the art and may be found, for example in Sambrook etal, Molecular Cloning, A Laboratory Manual, Second Edition, Cold SpringHarbor Press, Cold Spring Harbor, N.Y. (1989).

In one embodiment, the polymerases of the invention are used for DMAamplification. Any DNA procedure that employs a DNA polymerase can beused, for example, in polymerase chain reaction (PCR) assays, stranddisplacement amplification and other amplification procedures. Stranddisplacement amplification can be used as described in Walker et al(1992) Nucl. Acids Res. 20, 1691-1696. The term “polymerase chainreaction” (“PCR”) refers to the method of K. B. Mullis U.S. Pat. Nos.4,683,195; 4,683,202; and 4,965,188, hereby incorporated by reference,which describe a method for increasing the concentration of a segment ofa target sequence in a mixture of genomic or other DNA without cloningor purification.

The PCR process for amplifying a target sequence consists of introducinga large excess of two oligonucleotide primers to the DNA mixturecontaining the desired target sequence, followed by a precise sequenceof thermal cycling in the presence of a DNA polymerase. The two primersare complementary to their respective strands of the double strandedtarget sequence. To do amplification, the mixture is denatured and theprimers are annealed to complementary sequences within the targetmolecule. Following annealing, the primers are extended with apolymerase so as to form a new pair of complementary strands. The stepsof denaturation, printer annealing and polymerase extension are termed a“cycle.” There can be numerous cycles, and the amount of amplified DNAproduced increases with each cycle. Hence, to obtain a highconcentration of an amplified target nucleic acid, many cycles areperformed.

The steps involve in PCR nucleic acid amplification method are describedin more detail below. For ease of discussion, the nucleic acid to beamplified is described as being double-stranded. However, the process isequally useful for amplifying a single-stranded nucleic acid, such as anmRNA, although the ultimate product is generally double-stranded DNA. Inthe amplification of a single-stranded nucleic acid, the first stepinvolves the synthesis of a complementary strand using, for example, oneof the two amplification primers. The succeeding steps generally proceedas follows:

(a) Each nucleic acid strand is contacted with four different nucleosidetriphosphates and one oligonucleotide primer for each nucleic acidstrand to be amplified, wherein each primer is selected to besubstantially complementary to a portion the nucleic acid strand to beamplified, such that the extension product synthesized from one primer,when it is separated from its complement, can serve as a template forsynthesis of the extension product of the other primer. To promote theproper annealing of primer(s) and the nucleic acid strands to beamplified, a temperature that allows hybridization of each primer to acomplementary nucleic acid strand is used.

(b) After primer annealing, a polymerase is used for primer extensionthat incorporates the nucleoside triphosphates into a growing nucleicacid strand that is complementary to the strand hybridized by theprimer. In general, this primer extension reaction is performed at atemperature and for a time effective to promote the activity of theenzyme and to synthesize a “full length” complementary nucleic acidstrand, that extends into a through a complete second primer binding.However, the temperature is not so high as to separate each extensionproduct from its nucleic acid template strand.

(c) The mixture from step (b) is then heated for a time and at atemperature sufficient to separate the primer extension products fromtheir complementary templates. The temperature chosen is not so high asto irreversibly denature the polymerase present in the mixture.

(d) The mixture from (e) is cooled for a time and at a temperatureeffective to promote hybridization of a primer to each of thesingle-stranded molecules produced in step (b).

(e) The mixture from step (d) is maintained at a temperature and for atime sufficient to promote primer extension by polymerase to produce a“full length” extension product. The temperature used is not so high asto separate each extension product from the complementary strandtemplate. Steps (c)-(e) are repeated until the desired level ofamplification is obtained.

The amplification method is useful not only for producing large amountsof a specific nucleic acid sequence of known sequence but also forproducing nucleic acid sequences that are known to exist but are notcompletely specified. One need know only the identity of a sufficientnumber of bases at both ends of the sequence in sufficient detail sothat two oligonucleotide primers can be prepared that will hybridize todifferent strands of the desired sequence at those positions. Anextension product is synthesized from one primer. When that extensionproduct is separated from the template the extension product can serveas a template for extension of the other primer. The greater theknowledge about the bases at both ends of the sequence, the greater canbe the specificity of the primers for the target nucleic acid sequence.

Thermally stable DNA polymerases are therefore generally used for PCRbecause they can function at the high temperatures used for meltingdouble stranded target DNA and annealing the primers during each cycleof the PCR reaction. High temperature results in thermodynamicconditions that favor primer hybridization with the target sequences andnot hybridization with non-target sequences (H.A. Erlich (ed), PCRTechnology, Stockton Press [1989]).

The thermostable polymerases of the present invention satisfy therequirements for effective use in amplification reactions such as PCR.The present polymerases do not become irreversibly denatured(inactivated) when subjected to the required elevated temperatures forthe time necessary to melt double-stranded nucleic acids during theamplification process. Irreversible denaturation for purposes hereinrefers to permanent and complete loss of enzymatic activity. The heatingconditions necessary for nucleic acid denaturation will depend, e.g., onthe buffer salt concentration and the composition and length of thenucleic acids being denatured, but typically range from about 90° C. toabout 105° C. for a time depending mainly on the temperature and thenucleic acid length, typically from a few seconds up to four minutes.Higher temperatures may be required as the buffer salt concentrationand/or GC composition of the nucleic acid is increased. The polymerasesof the invention do not become irreversibly denatured for relativelyshort exposures to temperatures of about 90° C. to 100° C.

The thermostable polymerases of the invention have an optimumtemperature at which they function that is higher than about 45° C.Temperatures below 45° C. facilitate hybridization of primer totemplate, but depending on salt composition and concentration and primercomposition and length, hybridization of primer to template can occur athigher temperatures (e.g., 45° C. to 70° C.), which may promotespecificity of the primer hybridization reaction. The DNA polymerasepolypeptides of the invention exhibit activity over a broad temperaturerange from about 37° C. to about 90° C.

The present polymerases have particular utility for PCR not only becauseof their thermal stability but also because of their fidelity inreplicating the target nucleic acid. With PCR, it is possible to amplifya single copy of a specific target nucleic acid to a level detectable byseveral different methodologies. However, if the sequence of the targetnucleic acid, is not replicated with fidelity, then the amplifiedproduct can comprise a pool of nucleic acids with diverse sequences.Hence, a polymerase that can accurately replicate the sequence of thetarget is highly desirable.

Any nucleic acid can act as a “target nucleic acid” for the PCR methodsof the invention. The term “target,” when used in reference to thepolymerase chain reaction, refers to the region of nucleic acid boundedby the primers used for polymerase chain reaction. In addition togenomic DNA, any cDNA, oligonucleotide or polynucleotide can beamplified with the appropriate set of primer molecules. In particular,the amplified segments created by the PCR process itself are,themselves, efficient templates for subsequent PCR amplifications. Thelength of the amplified segment of the desired target sequence isdetermined by the relative positions of the primers with respect to eachother, and therefore, this length is a controllable parameter.

The amplified target nucleic acid can be detected by any method known toone of skill in the art. For example, target nucleic acids are oftenamplified to such an extent that they form a band visible on a sizeseparation gel. Target nucleic acids can also be detected byhybridization with a labeled probe; by incorporation of biotinylatedprimers during PCR followed, by avidin-enzyme conjugate detection; byincorporation of ³²P-labeled deoxynucleotide triphosphates during PCR,and the like.

The amount of amplification can also be monitored, for example, by useof a reporter-quencher oligonucleotide as described in U.S. Pat. No.5,723,591, and a polymerase of the invention that has 5′-3′ nucleaseactivity. The reporter-quencher oligonucleotide has an attached reportermolecule and an attached quencher molecule that is capable of quenchingthe fluorescence of the reporter molecule when the two are in proximity.Quenching occurs when the reporter-quencher oligonucleotide is nothybridized to a complementary nucleic acid because the reporter moleculeand the quencher molecule tend to be in proximity or at an optimaldistance for quenching. When hybridized, the reporter-quencheroligonucleotide emits more fluorescence than when unhybridized becausethe reporter molecule and the quencher molecule tend to be furtherapart. To monitor amplification, the reporter-quencher oligonucleotideis designed to hybridize 3′ to an amplification primer. Diningamplification, the 5′-3′ nuclease activity of the polymerase digests thereporter oligonucleotide probe, thereby separating the reporter moleculefrom the quencher molecule. As the amplification is conducted, thefluorescence of the reporter molecule increases. Accordingly, the amountof amplification performed can be quantified based on the increase offluorescence observed.

Oligonucleotides used for PCR primers are usually about 9 to about 75nucleotides, preferably about 17 to about 50 nucleotides in length.Preferably, an oligonucleotide for use in PCR reactions is about 40 orfewer nucleotides in length (e.g., 9, 12, 15, 18, 20, 21, 24,27, 30, 35,40, or any number between 9 and 40). Generally specific primers are atleast about 14 nucleotides in length. For optimum specificity and costeffectiveness, primers of 16-24 nucleotides in length are generallypreferred.

Those skilled in the art can readily design primers for use processessuch as PCR. For example, potential primers for nucleic acidamplification can be used as probes to determine whether the primer isselective tor a single target and what conditions permit hybridizationof a primer to a target within a sample or complex mixture of nucleicacids.

The present invention also contemplates use of the present polymerasepolypeptides in combination with other procedures or enzymes. Forexample, the polymerase polypeptides have reverse transcription activityand can be used for reverse transcription of an RNA. In this method, theRNA is convened to cDNA due to the reverse transcriptase activity of thepolymerase, and then amplified using a polymerizing activity of one ofthe thermostable polymerases of the invention. Additional reversetranscriptase enzyme may be added as needed. Such procedures areprovided in U.S. Pat. No. 5,322,770, incorporated by reference herein.

In another embodiment, polymerases of the invention with 5′- 3′exonuclease activity are used to detect target nucleic acids in aninvader-directed cleavage assay. This type of assay is described, forexample, in U.S. Pat. No. 5,994,069. It is important to note that the5′-3′ exonuclease of polymerases is not really an exonuclease thatprogressively cleaves nucleotides from the 5′ end of a nucleic acid, butrather a nuclease that can cleave certain types of nucleic acidstructures to produce oligonucleotide cleavage products. Such cleavageis sometimes called structure-specific cleavage.

In general, the invader-directed cleavage assay employs at least onepair of oligonucleotides that interact with a target nucleic acid toform a cleavage structure for the 5′-3′ nuclease activity of thepolymerase. Distinctive cleavage products are released when the cleavagestructure is cleaved by the 5′-3′ nuclease activity of the DNApolymerase. Formation of such a target-dependent cleavage structure andthe resulting cleavage products is indicative of the presence ofspecific target nucleic acid sequences in the test sample.

Therefore, in the invader-directed cleavage procedure, the 5′-3′nuclease activity of the present polymerases is needed as well at leastone pair of oligonucleotides that interact with a target nucleic acid toform a cleavage structure for the 5′-3′ nuclease. The firstoligonucleotide, sometimes termed the “probe,” can hybridize within thetarget site but downstream of a second oligonucleotide, sometimes termedan “invader” oligonucleotide. The invader oligonucleotide can hybridizeadjacent and upstream of the probe oligonucleotide. However, the targetsites to which the probe and invader oligonucleotides hybridize overlapsuch that the 3′ segment of the invader oligonucleotide overlaps withthe 5′ segment of the probe oligonucleotide. The 5′-3′ nuclease of thepresent polymerases can cleave the probe oligonucleotide at an internalsite to produce distinctive fragments that are diagnostic of thepresence of the target nucleic acid in a sample. Further details andmethods for adapting the invader-directed cleavage assay to particularsituations can be found in U.S. Pat. No. 5,994,069.

One or more nucleotide analogs can also be used with the presentmethods, kits and with the polymerases. Such nucleotide analogs can bemodified or non-naturally occurring nucleotides such as 7-deaza purines(i.e., 7-deaza-dATP and 7-deaza-dGTP). Nucleotide analogs Include baseanalogs and comprise modified forms of deoxyribonucleotides as well asribonucleotides. As used herein the term “nucleotide analog” when usedin reference to targets present in a PCR mixture refers to the use ofnucleotides other than dATP, dGTP, dCTP and dTTP; thus, the use of dUTP(a naturally occurring dNTP) in a PCR would comprise the use of anucleotide analog in the PCR. A PCR product generated using dUTP,7-deaza-dATP, 7-deaza-dGTP or any other nucleotide analog in thereaction mixture is said to contain nucleotide analogs.

The invention also provides kits that contain at least one of thepolymerases of the invention. Individual kits may be adapted forperforming one or more of the following procedures: DNA sequencing, DNAamplification, RNA Amplification and/or primer extension. Kits of theinvention comprise a DNA polymerase polypeptide of the invention and atleast one nucleotide. A nucleotide provided in the kits of the inventioncan be labeled or unlabeled. Kits preferably can also containinstructions on how to perform the procedures for which the kits areadapted.

Optionally, the subject kit may further comprise at least one otherreagent required for performing the method the kit is adapted toperform. Examples of such additional reagents include: another unlabelednucleotide, another labeled nucleotide, a balance mixture ofnucleotides, one or more chain terminating nucleotides, one or morenucleotide analogs, buffer solutions), magnesium solution(s), cloningvectors, restriction endonucleases, sequencing primers, reversetranscriptase, and DNA or RNA amplification primers. The reagentsincluded in the kits of the invention may be supplied in premeasuredunits so as to provide for greater precision and accuracy. Typically,kits reagents and other components are placed and contained in separatevessels. A reaction vessel, test tube, microwell tray, microtiter dishor other container can also be included in fee kit. Different labels canbe used on different reagents so that each reagent can be distinguishedfrom another.

The following Examples further illustrate the invention and are notintended to limit the scope of the invention.

EXAMPLE 1 Cloning of Thermus brockianus Nucleic Acid PolymerasesBacteria Growth and Genomic DNA Isolation

The 2AZN strain of Thermus brockianus used in this invention wasobtained from Dr. R.A.D, Williams, Queen Mary and Westfield College,London, England. Strain YS38 was obtained from the NCIMB collection.Both of these bacterial samples were obtained as lyophilized bacteriaand were revived in 4 ml of ATCC Thermus bacteria growth media 461(Castenholtz TYE medium). The 4 ml overnight cultures were grown at 65°C. in a water bath orbital shaker. The 4-ml cultures were transferred to200 ml of TYE and grown overnight at 65° C. in a water bath orbitalshaker to stationary phase. Thermus brockianus genomic DNAs wereprepared using a Qiagen genomic DNA preparation kit (Qiagen, Valencia,Calif.),

Cloning of the Thermus brockianus Polymerase Genes

The forward and reverse primers were designed by analysis of 5′ and 3′terminal homologous conserved regions of the Genebank DNA sequences ofthe DNA Pol I genes from Thermus aquaticus (Taq), Thermus thermophilus(Tth), Thermus filiformis (Tfi), Thermus caldophilus, and Thermusflavus. The Thermus brockianus polymerase gene from strain YS38 wasfirst cloned as a partial fragment which was amplified using N-terminalprimer 5′-ggc cac cac ctg gcc tac-3′ (SEQ ID NO:50) and C-terminalprimer 5′-ccc acc tcc acc tcc ag-3′ (SEQ ID NO:51) The PCR reactionmixture contained 2.5 ul of 10×cPfu Turbo reaction buffer (Stratagene),50 ng genomic DNA template, 0.2 mM (each) dNTPs, 20 pmol of each primer,and 10 units of Pfu Turbo DNA polymerase (Stratagene) in a 25 μl totalreaction volume. The reaction was started by adding a premix containingenzyme, MgCl₂, dNTPs, buffer and water to another premix containingprimer and template preheated at 80° C. The entire reaction mixture wasthen denatured (30 s, 96° C.) followed by 30 PCR cycles (97° C. for 3sec, 56° C. for 30 sec, 72° C. for 2 min 30 sec) with a finishing step(72° C. for 6 min). This produced an approximate 2.3 kb amplified DNAfragment. This amplified DNA fragment was purified from the PCR reactionmix using a Quiagen PCR cleanup kit (Quiagen). The fragment was thenligated into the inducible expression vector pCR®T7 CT-TOPO®(Invitrogen, Carlsbad, Calif.).

The sequence of the full-length Thermus brockianus strain YS38 openreading frame and flanking regions was obtained by genomic DMAsequencing using primers designed to hybridize to portions of theThermus brockianus strain YS3S Polymerase I gene. The C terminal end wassequenced using the forward primer 5′-cga cct caa cgc ccg ggt aaa ga-3′(SEQ ID NO:52). The N terminal end was sequenced using the reverseprimer 5′-gct ttt ggc gaa gcc gta gac ccc t-3′ (SEQ ID NO:53). Thesequencing reactions were performed using a pre-denaturation step (95°C., 5 min) followed by 60 cycles (97° C. for 5 sec., 60° C. for 4 min).The reaction mixture consisted of 16 ul Big Dye V1 Ready Reaction mix,2.4 μg DNA, 15 pmol primer in a 40 μl reaction volume. The sequence ofthe 5′ (start) and 3′ (end) of the Thermus brockianus YS38 gene werethus obtained.

Using the sequence information obtained in the genomic DNA sequencingreactions above, two primers were designed to amplify the full-lengthThermus brockianus YS38 polymerase gene: N-terminal primer 5′-cat atgctt ccc ctc ttt gag ccc a-3′ (SEQ ED NO:54) and C-terminal primer 5′-gtcgac tag ccc ttg gcg gaa agc-3′ (SEQ ID NO:55). These primers introducedNdeI and Sal I restriction sites that facilitated subcloning. The PCRreaction mixture used to amplify Thermus brockianus strain YS38contained 2.5 μl of 10× Amplitaq reaction buffer (Applied Biosystems), 2mM MgCl₂, 120 ng genomic DNA template, 0.2 mM (each) dNTPs, 20 pmol ofeach primer, and 1.25 units of Amplitaq in a 25 μl total reactionvolume. The reaction was started by adding a premix containing enzyme,MgCl₂, dNTPs, buffer and water to another premix containing primer andtemplate preheated at 80° C. The entire reaction mixture was thendenatured (30 sec at 96° C.) followed by 30 PCR cycles (97° C. for 3sec, 62° C. for 30 sec, 72° C. for 3 mm) with a finishing step (72° C.for 7 min).

The same primers used to amplify the polymerase gene from Thermusbrockianus strain YS38 were used to amplify the polymerase gene fromThermus brockianus strain 2AZN. The 2AZN PCR reaction contained 5 μl of10×cPfu Turbo reaction buffer (Stratagene), 200 ng genomic DNA template,0.2 mM (each) dNTPs, 20 pmol of each primer, and 2.5 units of Pfu TurboDNA polymerase (Stratagene) in a 50 μl total reaction volume. Thereaction was started by adding a premix containing enzyme, dNTPs, bufferand water to another premix containing primer and template preheated at80° C. The entire reaction mixture was then denatured (2 min, 96° C.)followed by PCR cycling for 25 cycles (96° C. for 5 sec, 64° C. for 30sec, 72° C. for 3 min) with a finishing step (72° C. for 5 min).

Both PCR reactions produced approximate 2.5 kb amplified DNA fragments.The amplified DNA fragments were purified from the PCR reaction mixusing a Qiagen PCR cleanup kit (Qiagen Inc., Valencia, Calif.), TheThermus brockianus strain YS3S fragment was ligated into the inducibleexpression vector pCR®T7 CT-TOPO® (Invitrogen, Carlsbad, Calif.). TheThermus brockianus strain 2AZN fragment was then ligated into the vectorpCR4®TOPO®TA (Invitrogen, Carlsbad, Calif.). Three different clones weresequenced in order to rule out PCR errors. The sequence of the Thermusbrockianus 2AZN polymerase gene is provided as SEQ ID NO:2. Theconsensus sequence of Thermus brockianus strain YS39 is provided as SEQID NO:1. The two sequences are compared in an alignment provided inFIG. 1. The sequences of both polymerase genes were reconfirmed bysequencing PCR fragments produced by reamplifying both full-length genesfrom the their respective genomic DNAs.

The deduced amino acid sequences of Thermus brockianus YS38 and Thermusbrockianus 2AZN were aligned with the polymerase enzymes from Thermusaquaticus (Taq) Thermus thermophilus (Tth), Thermus filiformis (Tfi) andThermus flavus using the program Vector NTI (Informax, Inc.). Thealignment is shown in FIG. 2. There are 44 amino acid positions whereThermus brockianus YS38 and/or Thermus brockianus 2AZN are differentfrom the published sequences of other known Thermus polymerases. Forexample, the Thermus brockianus polymerases have a different start sitefrom the others, which accounts for the different amino acid numbering.

Modification of Thermus brockianus Polymerase Coding Regions

To produce Thermus brockianus polymerase from strains YS39 and 2AZN in aform better suited for dye-terminator DNA sequencing, two amino acidsubstitutions were separately made in the nucleic acid coding thesepolymerases. These are the FS (Tabor and Richardson, 1995 PNAS 92:6339-6343; U.S. Pat. No. 5,614,365) and exo-minus mutations (see U.S.Pat. No. 5,466,591; Xu Y., Derbyshire V., Ng K., Sun X-C., Grindley N.D., Joyce C. M. (1997) J. Mol. Biol. 268, 284-302). To reduce theexonuclease activity to very low levels, the mutation G43D wasintroduced, To reduce the discrimination between ddNTP's and dNTP's, themutation F665Y was introduced.

Mutagenesis of the Thermus brockianus polymerase genes was carried outusing the modified QuickChange™ (Stratagene) PCR mutagenesis protocoldescribed in Sawano & Miyawaki (2000). The mutagenized gene wasresequenced completely to confirm the introduction of the mutations andto ensure that no PCR errors were introduced.

EXAMPLE 2 Protein Expression and Purification

Nucleic acids encoding both Thermus brockianus open reading frames wereseparately subcloned into the expression vector pET24a (Novagen,Madison, Wis.) using the Nde I and Sal I restriction sites. Theseplasmids were then used to transform BL2I E. coli cells. The cells weregrown in one liter of Terrific Broth (Maniatis) to an optical density of1.2 OD and the protein was overproduced by four-hour induction with 1.0mM IPTG. The cells were harvested by centrifugation, washed in 50 mMIris (pH 7.5), 5 mM EDTA, 5% glycerol, 10 mM EDTA to remove growthmedia, and the cell pellet frozen at −80° C.

To isolate Thermus brockianus polymerase enzymes, the cells were thawedand resuspended in 2.5 volumes (wet weight) of 50 mM Tris (pH 7.2), 400nM NaCl, 1 mM EDTA. The cell walls were disrupted by sonication. Theresulting E. coli cell debris was removed by centrifugation. The clearedlysate was pasteurized in a water bath (75° C., 45 min), denaturing andprecipitating the majority of the non-thermostable E. coli proteins andleaving the thermostable Thermus brockianus polymerase in solution.E.coli genomic DNA was removed by coprecipitation with 0.3%Polyethyleneimine (PEI). The cleared lysate was then applied to twocolumns in series: (1) a Biorex 70 cation exchange resin (BioRad,Hercules, Calif.) which chelates excess PEI and (2) a heparin-agaroseresin (Sigma, St. Louis, Mo.) which retains the polymerase. TheHeparin-agarose column was washed with 5 column volumes of 20 mM Tris(pH 8.5), 5% glycerol, 100 mM NaCl, 0.1 mM EDTA, 0.05% Triton X-100 and0.05% Tween-20 (KTA buffer). The protein was then elated with a 0.1 to1.0M NaCl linear gradient. The polymerase eluted at 0.8M NaCl. Theeluted Thermus brockianus polymerase enzymes were concentrated and thebuffer exchanged using a Millipore concentration filter (30 kD M. wt.cutoff). The concentrated protein was stored at in KTA buffer (no salt)plus 50% glycerol at −20° C. The activity of the polymerases wasmeasured using a nicked salmon sperm DNA radiometric activity assay.

All publications and patents mentioned in the above specification areherein incorporated by reference. Various modifications and variationsof the described method and system of the invention will be apparent tothose skilled in the art without departing from the scope and spirit ofthe invention. Although the invention has been described in connectionwith specific preferred embodiments, it should be understood that theinvention as claimed should not be unduly limited to such specificembodiments. Indeed, various modifications of the described modes forcarrying out the invention that are obvious to those skilled in therelevant arts are intended to be within the scope of the followingclaims.

1. An isolated nucleic acid encoding a nucleic acid polymerasecomprising any one of amino acid sequences SEQ ID NO:9-16.
 2. Theisolated nucleic acid of claim 1, wherein said nucleic acid polymerasecomprises a mutation that decreases 5-3′ exonuclease activity.
 3. Theisolated nucleic acid of claim 2, wherein said decreased 5-3′exonuclease activity is relative to a nucleic acid polymerase that doesnot comprise said mutation.
 4. The isolated nucleic acid of claim 1,wherein said nucleic acid polymerase comprises a mutation that reducesdiscrimination against dideoxynucleotide triphosphates.
 5. The isolatednucleic acid of claim 4, wherein said reduced discrimination againstdideoxynucleotide triphosphates is relative to a nucleic acid polymerasethat does not comprise said mutation.
 6. An isolated nucleic acidcomprising the nucleotide sequence of any one of SEQ ID NO:1-8 or anucleotide sequence complementary to any one of SEQ ID NO:1-8. 7.(canceled)
 8. An isolated nucleic acid encoding a nucleic acidpolymerase from Thermus brockianus comprising amino acid sequence withat least 96% identity to any one of SEQ ID NO:9-16.
 9. A vectorcomprising the isolated nucleic acid of claim
 8. 10.-16. (canceled) 17.An expression vector comprising a promoter operably linked to theisolated nucleic acid of claim
 8. 18.-24. (canceled)
 25. A host cellcomprising the isolated nucleic acid of claim
 8. 26.-54. (canceled)